ChainerCV

ChainerCV is a deep learning based computer vision library built on top of Chainer.

Installation Guide

Pip

You can install ChainerCV using pip.

pip install -U numpy
pip install chainercv

Anaconda

Build instruction using Anaconda is as follows.

# For python 3
# wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh -O miniconda.sh

bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
conda config --set always_yes yes --set changeps1 no
conda update -q conda

# Download ChainerCV and go to the root directory of ChainerCV
git clone https://github.com/chainer/chainercv
cd chainercv
conda env create -f environment.yml
source activate chainercv

# Install ChainerCV
pip install -e .

# Try our demos at examples/* !

ChainerCV Tutorial

Object Detection Tutorial

This tutorial will walk you through the features related to object detection that ChainerCV supports. We assume that readers have a basic understanding of Chainer framework (e.g. understand chainer.Link). For users new to Chainer, please first read Introduction to Chainer.

In ChainerCV, we define the object detection task as a problem of, given an image, bounding box based localization and categorization of objects. ChainerCV supports the task by providing the following features:

  • Visualization

  • BboxDataset

  • Detection Link

  • DetectionEvaluator

  • Training script for various detection models

Here is a short example that conducts inference and visualizes output. Please download an image from a link below, and name it as sample.jpg. https://cloud.githubusercontent.com/assets/2062128/26187667/9cb236da-3bd5-11e7-8bcf-7dbd4302e2dc.jpg

# In the rest of the tutorial, we assume that the `plt`
# is imported before every code snippet.
import matplotlib.pyplot as plt

from chainercv.datasets import voc_bbox_label_names
from chainercv.links import SSD300
from chainercv.utils import read_image
from chainercv.visualizations import vis_bbox

# Read an RGB image and return it in CHW format.
img = read_image('sample.jpg')
model = SSD300(pretrained_model='voc0712')
bboxes, labels, scores = model.predict([img])
vis_bbox(img, bboxes[0], labels[0], scores[0],
         label_names=voc_bbox_label_names)
plt.show()
_images/detection_tutorial_link_simple.png

Bounding boxes in ChainerCV

Bounding boxes in an image are represented as a two-dimensional array of shape \((R, 4)\), where \(R\) is the number of bounding boxes and the second axis corresponds to the coordinates of bounding boxes. The coordinates are ordered in the array by (y_min, x_min, y_max, x_max), where (y_min, x_min) and (y_max, x_max) are the (y, x) coordinates of the top left and the bottom right vertices. Notice that ChainerCV orders coordinates in yx order, which is the opposite of the convention used by other libraries such as OpenCV. This convention is adopted because it is more consistent with the memory order of an image that follows row-column order. Also, the dtype of bounding box array is numpy.float32.

Here is an example with a simple toy data.

from chainercv.visualizations import vis_bbox
import numpy as np

img = np.zeros((3, 224, 224), dtype=np.float32)
# We call a variable/array of bounding boxes as `bbox` throughout the library
bbox = np.array([[10, 10, 20, 40], [150, 150, 200, 200]], dtype=np.float32)

vis_bbox(img, bbox)
plt.show()
_images/detection_tutorial_simple_bbox.png

In this example, two bounding boxes are displayed on top of a black image. vis_bbox() is a utility function that visualizes bounding boxes and an image together.

Bounding Box Dataset

ChainerCV supports dataset loaders, which can be used to easily index examples with list-like interfaces. Dataset classes whose names end with BboxDataset contain annotations of where objects locate in an image and which categories they are assigned to. These datasets can be indexed to return a tuple of an image, bounding boxes and labels. The labels are stored in an np.int32 array of shape \((R,)\). Each element corresponds to a label of an object in the corresponding bounding box.

A mapping between an integer label and a category differs between datasets. This mapping can be obtained from objects whose names end with label_names, such as voc_bbox_label_names. These mappings become helpful when bounding boxes need to be visualized with label names. In the next example, the interface of BboxDataset and the functionality of vis_bbox() to visualize label names are illustrated.

from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.visualizations import vis_bbox

dataset = VOCBboxDataset(year='2012')
img, bbox, label = dataset[0]
print(bbox.shape)  # (2, 4)
print(label.shape)  # (2,)
vis_bbox(img, bbox, label, label_names=voc_bbox_label_names)
plt.show()
_images/detection_tutorial_bbox_dataset_vis.png

Note that the example downloads VOC 2012 dataset at runtime when it is used for the first time on the machine.

Detection Evaluator

ChainerCV provides functionalities that make evaluating detection links easy. They are provided at two levels: evaluator extensions and evaluation functions.

Evaluator extensions such as DetectionVOCEvaluator inherit from Evaluator, and have similar interface. They are initialized by taking an iterator and a network that carries out prediction with method predict(). When this class is called (i.e. __call__() of DetectionVOCEvaluator), several actions are taken. First, it iterates over a dataset based on an iterator. Second, the network makes prediction using the images collected from the dataset. Last, an evaluation function is called with the ground truth annotations and the prediction results.

In contrast to evaluators that hide details, evaluation functions such as eval_detection_voc() are provided for those who need a finer level of control. These functions take the ground truth annotations and prediction results as arguments and return measured performance.

Here is a simple example that uses a detection evaluator.

from chainer.iterators import SerialIterator
from chainer.datasets import SubDataset
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names
from chainercv.extensions import DetectionVOCEvaluator
from chainercv.links import SSD300

# Only use subset of dataset so that evaluation finishes quickly.
dataset = VOCBboxDataset(year='2007', split='test')
dataset = dataset[:6]
it = SerialIterator(dataset, 2, repeat=False, shuffle=False)
model = SSD300(pretrained_model='voc0712')
evaluator = DetectionVOCEvaluator(it, model,
                                  label_names=voc_bbox_label_names)
# result is a dictionary of evaluation scores. Print it and check it.
result = evaluator()

References

Ren15

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.

Liu16

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

Sliceable Dataset

This tutorial will walk you through the features related to sliceable dataset. We assume that readers have a basic understanding of Chainer dataset (e.g. understand chainer.dataset.DatasetMixin).

In ChainerCV, we introduce sliceable feature to datasets. Sliceable datasets support slice() that returns a view of the dataset.

This example that shows the basic usage.

# VOCBboxDataset supports sliceable feature
from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# keys returns the names of data
print(dataset.keys)  # ('img', 'bbox', 'label')
# we can get an example by []
img, bbox, label = dataset[0]

# get a view of the first 100 examples
view = dataset.slice[:100]
print(len(view))  # 100

# get a view of image and label
view = dataset.slice[:, ('img', 'label')]
# the view also supports sliceable, so that we can call keys
print(view.keys)  # ('img', 'label')
# we can get an example by []
img, label = view[0]

Motivation

slice() returns a view of the dataset without conducting data loading, where DatasetMixin.__getitem__() conducts get_example() for all required examples. Users can write efficient code by this view.

This example counts the number of images that contain dogs. With the sliceable feature, we can access the label information without loading images from disk.. Therefore, the first case becomes faster.

import time

from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names

dataset = VOCBboxDataset()
dog_lb = voc_bbox_label_names.index('dog')

# with slice
t = time.time()
count = 0
# get a view of label
view = dataset.slice[:, 'label']
for i in range(len(view)):
    # we can focus on label
    label = view[i]
    if dog_lb in label:
        count += 1
print('w/ slice: {} secs'.format(time.time() - t))
print('{} images contain dogs'.format(count))
print()

# without slice
t = time.time()
count = 0
for i in range(len(dataset)):
    # img and bbox are loaded but not needed
    img, bbox, label = dataset[i]
    if dog_lb in label:
        count += 1
print('w/o slice: {} secs'.format(time.time() - t))
print('{} images contain dogs'.format(count))
print()

Usage: slice along with the axis of examples

slice() takes indices of examples as its first argument.

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# the view of the first 100 examples
view = dataset.slice[:100]

# the view of the last 100 examples
view = dataset.slice[-100:]

# the view of the 3rd, 5th, and 7th examples
view = dataset.slice[3:8:2]

# the view of the 3rd, 1st, and 4th examples
view = dataset.slice[[3, 1, 4]]

Also, it can take a list of booleans as its first argument. Note that the length of the list should be the same as len(dataset).

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# make booleans
bboxes = dataset.slice[:, 'bbox']
booleans = [len(bbox) >= 3 for bbox in bboxes]

# a collection of samples that contain at least three bounding boxes
view = dataset.slice[booleans]

Usage: slice along with the axis of data

slice() takes names or indices of data as its second argument. keys returns all available names.

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# the view of image
# note that : of the first argument means all examples
view = dataset.slice[:, 'img']
print(view.keys)  # 'img'
img = view[0]

# the view of image and label
view = dataset.slice[:, ('img', 'label')]
print(view.keys)  # ('img', 'label')
img, label = view[0]

# the view of image (returns a tuple)
view = dataset.slice[:, ('img',)]
print(view.keys)  # ('img',)
img, = view[0]

# use an index instead of a name
view = dataset.slice[:, 1]
print(view.keys)  # 'bbox'
bbox = view[0]

# mixture of names and indices
view = dataset.slice[:, (1, 'label')]
print(view.keys)  # ('bbox', 'label')
bbox, label = view[0]

# use booleans
# note that the number of booleans should be the same as len(dataset.keys)
view = dataset.slice[:, (True, True, False)]
print(view.keys)  # ('img', 'bbox')
img, bbox = view[0]

Usage: slice along with both axes

from chainercv.datasets import VOCBboxDataset
dataset = VOCBboxDataset()

# the view of the labels of the first 100 examples
view = dataset.slice[:100, 'label']

Concatenate and transform

ChainerCV provides ConcatenatedDataset and TransformDataset. The difference from chainer.datasets.ConcatenatedDataset and chainer.datasets.TransformDataset is that they take sliceable dataset(s) and return a sliceable dataset.

from chainercv.chainer_experimental.datasets.sliceable import ConcatenatedDataset
from chainercv.chainer_experimental.datasets.sliceable import TransformDataset
from chainercv.datasets import VOCBboxDataset
from chainercv.datasets import voc_bbox_label_names

dataset_07 = VOCBboxDataset(year='2007')
print('07:', dataset_07.keys, len(dataset_07))  # 07: ('img', 'bbox', 'label') 2501

dataset_12 = VOCBboxDataset(year='2012')
print('12:', dataset_12.keys, len(dataset_12))  # 12: ('img', 'bbox', 'label') 5717

# concatenate
dataset_0712 = ConcatenatedDataset(dataset_07, dataset_12)
print('0712:', dataset_0712.keys, len(dataset_0712))  # 0712: ('img', 'bbox', 'label') 8218

# transform
def transform(in_data):
    img, bbox, label = in_data

    dog_lb = voc_bbox_label_names.index('dog')
    bbox_dog = bbox[label == dog_lb]

    return img, bbox_dog

# we need to specify the names of data that the transform function returns
dataset_0712_dog = TransformDataset(dataset_0712, ('img', 'bbox_dog'), transform)
print('0712_dog:', dataset_0712_dog.keys, len(dataset_0712_dog))  # 0712_dog: ('img', 'bbox_dog') 8218

Make your own dataset

ChainerCV provides GetterDataset to construct a new sliceable dataset.

This example implements a sliceable bounding box dataset.

import numpy as np

from chainercv.chainer_experimental.datasets.sliceable import GetterDataset
from chainercv.utils import generate_random_bbox

class SampleBboxDataset(GetterDataset):
    def __init__(self):
        super(SampleBboxDataset, self).__init__()

        # register getter method for image
        self.add_getter('img', self.get_image)
        # register getter method for bbox and label
        self.add_getter(('bbox', 'label'), self.get_annotation)

    def __len__(self):
        return 20

    def get_image(self, i):
        print('get_image({})'.format(i))
        # generate dummy image
        img = np.random.uniform(0, 255, size=(3, 224, 224)).astype(np.float32)
        return img

    def get_annotation(self, i):
        print('get_annotation({})'.format(i))
        # generate dummy annotations
        bbox = generate_random_bbox(10, (224, 224), 10, 224)
        label = np.random.randint(0, 9, size=10).astype(np.int32)
        return bbox, label

dataset = SampleBboxDataset()
img, bbox, label = dataset[0]  # get_image(0) and get_annotation(0)

view = dataset.slice[:, 'label']
label = view[1]  # get_annotation(1)

If you have arrays of data, you can use TupleDataset.

import numpy as np

from chainercv.chainer_experimental.datasets.sliceable import TupleDataset
from chainercv.utils import generate_random_bbox

n = 20
imgs = np.random.uniform(0, 255, size=(n, 3, 224, 224)).astype(np.float32)
bboxes = [generate_random_bbox(10, (224, 224), 10, 224) for _ in range(n)]
labels = np.random.randint(0, 9, size=(n, 10)).astype(np.int32)

dataset = TupleDataset(('img', imgs), ('bbox', bboxes), ('label', labels))

print(dataset.keys)  # ('img', 'bbox', 'label')
view = dataset.slice[:, 'label']
label = view[1]

ChainerCV Reference Manual

Chainer Experimental

This module contains WIP modules of Chainer. After they are merged into chainer, these modules will be removed from ChainerCV.

Datasets

Sliceable
Sliceable

This module support sliceable feature. Please note that this module will be removed after Chainer implements sliceable feature.

ConcatenatedDataset
class chainercv.chainer_experimental.datasets.sliceable.ConcatenatedDataset(*datasets)[source]

A sliceable version of chainer.datasets.ConcatenatedDataset.

Here is an example.

>>> dataset_a = TupleDataset([0, 1, 2], [0, 1, 4])
>>> dataset_b = TupleDataset([3, 4, 5], [9, 16, 25])
>>>
>>> dataset = ConcatenatedDataset(dataset_a, dataset_b)
>>> dataset.slice[:, 0][:]  # [0, 1, 2, 3, 4, 5]
Parameters

datasets – The underlying datasets. Each dataset should inherit Sliceabledataset and should have the same keys.

get_example_by_keys(index, key_indices)[source]

Return data of an example by keys

Parameters
  • index (int) – An index of an example.

  • key_indices (tuple of ints) – A tuple of indices of requested keys.

Returns

tuple of data

property keys

Return names of all keys

Returns

string or tuple of strings

GetterDataset
class chainercv.chainer_experimental.datasets.sliceable.GetterDataset[source]

A sliceable dataset class that is defined with getters.

This is a dataset class with getters. Please refer to the tutorial for more detailed explanation.

Here is an example.

>>> class SliceableLabeledImageDataset(GetterDataset):
>>>     def __init__(self, pairs, root='.'):
>>>         super(SliceableLabeledImageDataset, self).__init__()
>>>         with open(pairs) as f:
>>>             self._pairs = [l.split() for l in f]
>>>         self._root = root
>>>
>>>         self.add_getter('img', self.get_image)
>>>         self.add_getter('label', self.get_label)
>>>
>>>     def __len__(self):
>>>         return len(self._pairs)
>>>
>>>     def get_image(self, i):
>>>         path, _ = self._pairs[i]
>>>         return read_image(os.path.join(self._root, path))
>>>
>>>     def get_label(self, i):
>>>         _, label = self._pairs[i]
>>>         return np.int32(label)
>>>
>>> dataset = SliceableLabeledImageDataset('list.txt')
>>>
>>> # get a subset with label = 0, 1, 2
>>> # no images are loaded
>>> indices = [i for i, label in
...            enumerate(dataset.slice[:, 'label']) if label in {0, 1, 2}]
>>> dataset_012 = dataset.slice[indices]
add_getter(keys, getter)[source]

Register a getter function

Parameters
  • keys (string or tuple of strings) – The name(s) of data that the getter function returns.

  • getter (callable) – A getter function that takes an index and returns data of the corresponding example.

get_example_by_keys(index, key_indices)[source]

Return data of an example by keys

Parameters
  • index (int) – An index of an example.

  • key_indices (tuple of ints) – A tuple of indices of requested keys.

Returns

tuple of data

property keys

Return names of all keys

Returns

string or tuple of strings

TupleDataset
class chainercv.chainer_experimental.datasets.sliceable.TupleDataset(*datasets)[source]

A sliceable version of chainer.datasets.TupleDataset.

Here is an example.

>>> # omit keys
>>> dataset = TupleDataset([0, 1, 2], [0, 1, 4])
>>> dataset.keys  # (None, None)
>>> dataset.slice[:, 0][:]  # [0, 1, 2]
>>>
>>> dataset_more = TupleDataset(dataset, [0, 1, 8])
>>> dataset_more.keys  # (None, None, None)
>>> dataset_more.slice[:, [1, 2]][:]  # [(0, 0), (1, 1), (4, 8)]
>>>
>>> # specify the name of a key
>>> named_dataset = TupleDataset(('feat0', [0, 1, 2]), [0, 1, 4])
>>> named_dataset.keys  # ('feat0', None)
>>> # slice takes both key and index (or their mixture)
>>> named_dataset.slice[:, ['feat0', 1]][:]  # [(0, 0), (1, 1), (2, 4)]
Parameters

datasets

The underlying datasets. The following datasets are acceptable.

  • An inheritance of :class:~chainer.datasets.sliceable.SliceableDataset`.

  • A tuple of a name and a data array. The data array should be list or numpy.ndarray.

  • A data array. In this case, the name of key is None.

get_example_by_keys(index, key_indices)[source]

Return data of an example by keys

Parameters
  • index (int) – An index of an example.

  • key_indices (tuple of ints) – A tuple of indices of requested keys.

Returns

tuple of data

property keys

Return names of all keys

Returns

string or tuple of strings

TransformDataset
class chainercv.chainer_experimental.datasets.sliceable.TransformDataset(dataset, keys, transform=None)[source]

A sliceable version of chainer.datasets.TransformDataset.

Note that it reuqires keys to determine the names of returned values.

Here is an example.

>>> def transfrom(in_data):
>>>     img, bbox, label = in_data
>>>     ...
>>>     return new_img, new_label
>>>
>>> dataset = TramsformDataset(dataset, ('img', 'label'), transform)
>>> dataset.keys  # ('img', 'label')
Parameters
  • dataset – The underlying dataset. This dataset should have __len__() and __getitem__().

  • keys (string or tuple of strings) – The name(s) of data that the transform function returns. If this parameter is omitted, __init__() fetches a sample from the underlying dataset to determine the number of data.

  • transform (callable) – A function that is called to transform values returned by the underlying dataset’s __getitem__().

Training

Extensions
Extensions
make_shift
chainercv.chainer_experimental.training.extensions.make_shift(attr, optimizer=None)[source]

Decorator to make shift extensions.

This decorator wraps a function and makes a shift extension. Base function should take trainer and return a new value of attr.

Here is an example.

>>> # define an extension that updates 'lr' attribute
>>> @make_shift('lr')
>>> def warmup(trainer):
>>>     base_lr = 0.01
>>>     rate = 0.1
>>>
>>>     iteration = trainer.updater.iteration
>>>     if iteration < 1000:
>>>         return base_lr * (rate + (1 - rate) * iteraion / 1000)
>>>     else:
>>>         return base_lr
>>>
>>> # use the extension
>>> trainer.extend(warmup)
Parameters
  • attr (str) – Name of the attribute to shift.

  • optimizer (Optimizer) – Target optimizer to adjust the attribute. If it is None, the main optimizer of the updater is used.

Datasets

General datasets

DirectoryParsingLabelDataset
class chainercv.datasets.DirectoryParsingLabelDataset(root, check_img_file=None, color=True, numerical_sort=False)[source]

A label dataset whose label names are the names of the subdirectories.

The label names are the names of the directories that locate a layer below the root directory. All images locating under the subdirectoies will be categorized to classes with subdirectory names. An image is parsed only when the function check_img_file returns True by taking the path to the image as an argument. If check_img_file is None, the path with any image extensions will be parsed.

Example

A directory structure should be one like below.

root
|-- class_0
|   |-- img_0.png
|   |-- img_1.png
|
--- class_1
    |-- img_0.png
>>> from chainercv.datasets import DirectoryParsingLabelDataset
>>> dataset = DirectoryParsingLabelDataset('root')
>>> dataset.img_paths
['root/class_0/img_0.png', 'root/class_0/img_1.png',
'root_class_1/img_0.png']
>>> dataset.labels
array([0, 0, 1])
Parameters
  • root (string) – The root directory.

  • check_img_file (callable) – A function to determine if a file should be included in the dataset.

  • color (bool) – If True, this dataset read images as color images. The default value is True.

  • numerical_sort (bool) – Label names are sorted numerically. This means that label 2 is before label 10, which is not the case when string sort is used. Regardless of this option, string sort is used for the order of files with the same label. The default value is False.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\) 1

float32

RGB, \([0, 255]\)

label

scalar

int32

\([0, \#class - 1]\)

1

\((1, H, W)\) if color = False.

directory_parsing_label_names
chainercv.datasets.directory_parsing_label_names(root, numerical_sort=False)[source]

Get label names from the directories that are named by them.

The label names are the names of the directories that locate a layer below the root directory.

The label names can be used together with DirectoryParsingLabelDataset. The index of a label name corresponds to the label id that is used by the dataset to refer the label.

Parameters
  • root (string) – The root directory.

  • numerical_sort (bool) – Label names are sorted numerically. This means that label 2 is before label 10, which is not the case when string sort is used. The default value is False.

Returns

Sorted names of classes.

Return type

list of strings

MixUpSoftLabelDataset
class chainercv.datasets.MixUpSoftLabelDataset(dataset, n_class, alpha=1.0)[source]

Dataset which returns mixed images and labels for mixup learning 2.

MixUpSoftLabelDataset mixes two pairs of labeled images fetched from the base dataset.

Unlike LabeledImageDatasets, label is a one-dimensional float array with at most two nonnegative weights (i.e. soft label). The sum of the two weights is one.

Example

We construct a mixup dataset from MNIST.

>>> from chainer.datasets import get_mnist
>>> from chainercv.datasets import SiameseDataset
>>> from chainercv.datasets import MixUpSoftLabelDataset
>>> mnist, _ = get_mnist()
>>> base_dataset = SiameseDataset(mnist, mnist)
>>> dataset = MixUpSoftLabelDataset(base_dataset, 10)
>>> mixed_image, mixed_label = dataset[0]
>>> mixed_label.shape
(10,)
>>> mixed_label.dtype
dtype('float32')
Parameters
  • dataset

    The underlying dataset. The dataset returns img_0, label_0, img_1, label_1, which is a tuple containing two pairs of an image and a label. Typically, dataset is SiameseDataset.

    The shapes of images and labels should be constant.

  • n_class (int) – The number of classes in the base dataset.

  • alpha (float) – A hyperparameter of Beta distribution. mix_ratio is sampled from \(B(\alpha,\alpha)\). The default value is \(1.0\) meaning that the distribution is the same as Uniform distribution with lower boundary of \(0.0\) and upper boundary of \(1.0\).

2

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz. mixup: Beyond Empirical Risk Minimization. arXiv 2017.

This dataset returns the following data.

name

shape

dtype

format

img

3

3

3

label

\((\#class,)\)

float32

\([0, 1]\)

3(1,2,3)

Same as dataset.

SiameseDataset
class chainercv.datasets.SiameseDataset(dataset_0, dataset_1, pos_ratio=None, length=None, labels_0=None, labels_1=None)[source]

A dataset that returns samples fetched from two datasets.

The dataset returns samples from the two base datasets. If pos_ratio is not None, SiameseDataset can be configured to return positive pairs at the ratio of pos_ratio and negative pairs at the ratio of 1 - pos_ratio. In this mode, the base datasets are assumed to be label datasets that return an image and a label as a sample.

Example

We construct a siamese dataset from MNIST.

>>> from chainer.datasets import get_mnist
>>> from chainercv.datasets import SiameseDataset
>>> mnist, _ = get_mnist()
>>> dataset = SiameseDataset(mnist, mnist, pos_ratio=0.3)
# The probability of the two samples having the same label
# is 0.3 as specified by pos_ratio.
>>> img_0, label_0, img_1, label_1 = dataset[0]
# The returned examples may change in the next
# call even if the index is the same as before
# because SiameseDataset picks examples randomly
# (e.g., img_0_new may differ from img_0).
>>> img_0_new, label_0_new, img_1_new, label_1_new = dataset[0]
Parameters
  • dataset_0 – The first base dataset.

  • dataset_1 – The second base dataset.

  • pos_ratio (float) – If this is not None, this dataset tries to construct positive pairs at the given rate. If None, this dataset randomly samples examples from the base datasets. The default value is None.

  • length (int) – The length of this dataset. If None, the length of the first base dataset is the length of this dataset.

  • labels_0 (numpy.ndarray) – The labels associated to the first base dataset. The length should be the same as the length of the first dataset. If this is None, the labels are automatically fetched using the following line of code: [ex[1] for ex in dataset_0]. By setting labels_0 and skipping the fetching iteration, the computation cost can be reduced. Also, if pos_ratio is None, this value is ignored. The default value is None. If labels_1 is spcified and dataset_0 and dataset_1 are the same, labels_0 can be skipped.

  • labels_1 (numpy.ndarray) – The labels associated to the second base dataset. If labels_0 is spcified and dataset_0 and dataset_1 are the same, labels_1 can be skipped. Please consult the explanation for labels_0.

This dataset returns the following data.

name

shape

dtype

format

img_0

4

4

4

label_0

scalar

int32

\([0, \#class - 1]\)

img_1

5

5

5

label_1

scalar

int32

\([0, \#class - 1]\)

4(1,2,3)

Same as dataset_0.

5(1,2,3)

Same as dataset_1.

ADE20K

ADE20KSemanticSegmentationDataset
class chainercv.datasets.ADE20KSemanticSegmentationDataset(data_dir='auto', split='train')[source]

Semantic segmentation dataset for ADE20K.

This is ADE20K dataset distributed in MIT Scene Parsing Benchmark website. It has 20,210 training images and 2,000 validation images.

Parameters
  • data_dir (string) – Path to the dataset directory. The directory should contain the ADEChallengeData2016 directory. And that directory should contain at least images and annotations directries. If auto is given, the dataset is automatically downloaded into $CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k.

  • split ({'train', 'val'}) – Select from dataset splits used in MIT Scene Parsing Benchmark dataset (ADE20K).

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

\((H, W)\)

int32

\([-1, \#class - 1]\)

ADE20KTestImageDataset
class chainercv.datasets.ADE20KTestImageDataset(data_dir='auto')[source]

Image dataset for test split of ADE20K.

This is an image dataset of test split in ADE20K dataset distributed at MIT Scene Parsing Benchmark website. It has 3,352 test images.

Parameters

data_dir (string) – Path to the dataset directory. The directory should contain the release_test dir. If auto is given, the dataset is automatically downloaded into $CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

CamVid

CamVidDataset
class chainercv.datasets.CamVidDataset(data_dir='auto', split='train')[source]

Semantic segmentation dataset for CamVid.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/camvid.

  • split ({'train', 'val', 'test'}) – Select from dataset splits used in CamVid Dataset.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

\((H, W)\)

int32

\([-1, \#class - 1]\)

Cityscapes

CityscapesSemanticSegmentationDataset
class chainercv.datasets.CityscapesSemanticSegmentationDataset(data_dir='auto', label_resolution=None, split='train', ignore_labels=True)[source]

Semantic segmentation dataset for Cityscapes dataset.

Note

Please manually download the data because it is not allowed to re-distribute Cityscapes dataset.

Parameters
  • data_dir (string) – Path to the dataset directory. The directory should contain at least two directories, leftImg8bit and either gtFine or gtCoarse. If auto is given, it uses $CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes by default.

  • label_resolution ({'fine', 'coarse'}) – The resolution of the labels. It should be either fine or coarse.

  • split ({'train', 'val'}) – Select from dataset splits used in Cityscapes dataset.

  • ignore_labels (bool) – If True, the labels marked ignoreInEval defined in the original cityscapesScripts will be replaced with -1 in the get_example() method. The default value is True.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

\((H, W)\)

int32

\([-1, \#class - 1]\)

CityscapesTestImageDataset
class chainercv.datasets.CityscapesTestImageDataset(data_dir='auto')[source]

Image dataset for test split of Cityscapes dataset.

Note

Please manually download the data because it is not allowed to re-distribute Cityscapes dataset.

Parameters

data_dir (string) – Path to the dataset directory. The directory should contain the leftImg8bit directory. If auto is given, it uses $CHAINER_DATSET_ROOT/pfnet/chainercv/cityscapes by default.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

CUB

CUBLabelDataset
class chainercv.datasets.CUBLabelDataset(data_dir='auto', return_bbox=False, prob_map_dir='auto', return_prob_map=False)[source]

Caltech-UCSD Birds-200-2011 dataset with annotated class labels.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.

  • return_bbox (bool) – If True, this returns a bounding box around a bird. The default value is False.

  • prob_map_dir (string) – Path to the root of the probability maps. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.

  • return_prob_map (bool) – Decide whether to include a probability map of the bird in a tuple served for a query. The default value is False.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

scalar

int32

\([0, \#class - 1]\)

bbox 6

\((1, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

prob_map 7

\((H, W)\)

float32

\([0, 1]\)

6

bb indicates the location of a bird. It is available if return_bbox = True.

7

prob_map indicates how likey a bird is located at each the pixel. It is available if return_prob_map = True.

CUBKeypointDataset
class chainercv.datasets.CUBKeypointDataset(data_dir='auto', return_bbox=False, prob_map_dir='auto', return_prob_map=False)[source]

Caltech-UCSD Birds-200-2011 dataset with annotated points.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.

  • return_bbox (bool) – If True, this returns a bounding box around a bird. The default value is False.

  • prob_map_dir (string) – Path to the root of the probability maps. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/cub.

  • return_prob_map (bool) – Decide whether to include a probability map of the bird in a tuple served for a query. The default value is False.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

point

\((1, 15, 2)\)

float32

\((y, x)\)

visible

\((1, 15)\)

bool

bbox 8

\((1, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

prob_map 9

\((H, W)\)

float32

\([0, 1]\)

8

bb indicates the location of a bird. It is available if return_bbox = True.

9

prob_map indicates how likey a bird is located at each the pixel. It is available if return_prob_map = True.

MS COCO

COCOBboxDataset
class chainercv.datasets.COCOBboxDataset(data_dir='auto', split='train', year='2017', use_crowded=False, return_area=False, return_crowded=False)[source]

Bounding box dataset for MS COCO.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/coco.

  • split ({'train', 'val', 'minival', 'valminusminival'}) – Select a split of the dataset.

  • year ({'2014', '2017'}) – Use a dataset released in year. Splits minival and valminusminival are only supported in year 2014.

  • use_crowded (bool) – If true, use bounding boxes that are labeled as crowded in the original annotation. The default value is False.

  • return_area (bool) – If true, this dataset returns areas of masks around objects. The default value is False.

  • return_crowded (bool) – If true, this dataset returns a boolean array that indicates whether bounding boxes are labeled as crowded or not. The default value is False.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

bbox 10

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

label 10

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

area 10 11

\((R,)\)

float32

crowded 12

\((R,)\)

bool

10(1,2,3,4)

If use_crowded = True, bbox, label and area contain crowded instances.

11

area is available if return_area = True.

12

crowded is available if return_crowded = True.

When there are more than ten objects from the same category, bounding boxes correspond to crowd of instances instead of individual instances. Please see more detail in the Fig. 12 (e) of the summary paper 13.

13

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar. Microsoft COCO: Common Objects in Context. arXiv 2014.

COCOInstanceSegmentationDataset
class chainercv.datasets.COCOInstanceSegmentationDataset(data_dir='auto', split='train', year='2017', use_crowded=False, return_crowded=False, return_area=False, return_bbox=False)[source]

Instance segmentation dataset for MS COCO.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/coco.

  • split ({'train', 'val', 'minival', 'valminusminival'}) – Select a split of the dataset.

  • year ({'2014', '2017'}) – Use a dataset released in year. Splits minival and valminusminival are only supported in year 2014.

  • use_crowded (bool) – If true, use masks that are labeled as crowded in the original annotation.

  • return_crowded (bool) – If true, this dataset returns a boolean array that indicates whether masks are labeled as crowded or not. The default value is False.

  • return_area (bool) – If true, this dataset returns areas of masks around objects.

  • return_bbox (bool) – If true, this dataset returns bounding boxes arround objects.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

mask 14

\((R, H, W)\)

bool

label 14

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

area 14 15

\((R,)\)

float32

crowded 16

\((R,)\)

bool

bbox 10

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

14(1,2,3)

If use_crowded = True, mask, label, area and bbox contain crowded instances.

15

area is available if return_area = True.

16

crowded is available if return_crowded = True.

When there are more than ten objects from the same category, masks correspond to crowd of instances instead of individual instances. Please see more detail in the Fig. 12 (e) of the summary paper 17.

17

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar. Microsoft COCO: Common Objects in Context. arXiv 2014.

COCOSemanticSegmentationDataset
class chainercv.datasets.COCOSemanticSegmentationDataset(data_dir='auto', split='train')[source]

Semantic segmentation dataset for MS COCO.

Semantic segmentations are generated from panoptic segmentations as done in the official toolkit.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/coco.

  • split ({'train', 'val'}) – Select a split of the dataset.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

\((H, W)\)

int32

\([-1, \#class - 1]\)

OnlineProducts

OnlineProductsDataset
class chainercv.datasets.OnlineProductsDataset(data_dir='auto', split='train')[source]

Dataset class for Stanford Online Products Dataset.

The split selects train and test split of the dataset as done in 18. The train split contains the first 11318 classes and the test split contains the remaining 11316 classes.

18

Hyun Oh Song, Yu Xiang, Stefanie Jegelka, Silvio Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. arXiv 2015.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/online_products.

  • split ({'train', 'test'}) – Select a split of the dataset.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

scalar

int32

\([0, \#class - 1]\)

super_label

scalar

int32

\([0, \#super\_class - 1]\)

PASCAL VOC

VOCBboxDataset
class chainercv.datasets.VOCBboxDataset(data_dir='auto', split='train', year='2012', use_difficult=False, return_difficult=False)[source]

Bounding box dataset for PASCAL VOC.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.

  • split ({'train', 'val', 'trainval', 'test'}) – Select a split of the dataset. test split is only available for 2007 dataset.

  • year ({'2007', '2012'}) – Use a dataset prepared for a challenge held in year.

  • use_difficult (bool) – If True, use images that are labeled as difficult in the original annotation.

  • return_difficult (bool) – If True, this dataset returns a boolean array that indicates whether bounding boxes are labeled as difficult or not. The default value is False.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

bbox 19

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

label 19

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

difficult (optional 20)

\((R,)\)

bool

19(1,2)

If use_difficult = True, bbox and label contain difficult instances.

20

difficult is available if return_difficult = True.

VOCInstanceSegmentationDataset
class chainercv.datasets.VOCInstanceSegmentationDataset(data_dir='auto', split='train')[source]

Instance segmentation dataset for PASCAL VOC2012.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.

  • split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

mask

\((R, H, W)\)

bool

label

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

VOCSemanticSegmentationDataset
class chainercv.datasets.VOCSemanticSegmentationDataset(data_dir='auto', split='train')[source]

Semantic segmentation dataset for PASCAL VOC2012.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/voc.

  • split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

\((H, W)\)

int32

\([-1, \#class - 1]\)

Semantic Boundaries Dataset

SBDInstanceSegmentationDataset
class chainercv.datasets.SBDInstanceSegmentationDataset(data_dir='auto', split='train')[source]

Instance segmentation dataset for Semantic Boundaries Dataset SBD.

Parameters
  • data_dir (string) – Path to the root of the training data. If this is auto, this class will automatically download data for you under $CHAINER_DATASET_ROOT/pfnet/chainercv/sbd.

  • split ({'train', 'val', 'trainval'}) – Select a split of the dataset.

This dataset returns the following data.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

mask

\((R, H, W)\)

bool

label

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

Evaluations

Detection COCO

eval_detection_coco
chainercv.evaluations.eval_detection_coco(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_areas=None, gt_crowdeds=None)[source]

Evaluate detections based on evaluation code of MS COCO.

This function evaluates predicted bounding boxes obtained from a dataset by using average precision for each class. The code is based on the evaluation code used in MS COCO.

Parameters
  • pred_bboxes (iterable of numpy.ndarray) – See the table below.

  • pred_labels (iterable of numpy.ndarray) – See the table below.

  • pred_scores (iterable of numpy.ndarray) – See the table below.

  • gt_bboxes (iterable of numpy.ndarray) – See the table below.

  • gt_labels (iterable of numpy.ndarray) – See the table below.

  • gt_areas (iterable of numpy.ndarray) – See the table below. If None, some scores are not returned.

  • gt_crowdeds (iterable of numpy.ndarray) – See the table below.

name

shape

dtype

format

pred_bboxes

\([(R, 4)]\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

pred_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

pred_scores

\([(R,)]\)

float32

gt_bboxes

\([(R, 4)]\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

gt_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

gt_areas

\([(R,)]\)

float32

gt_crowdeds

\([(R,)]\)

bool

All inputs should have the same length. For more detailed explanation of the inputs, please refer to chainercv.datasets.COCOBboxDataset.

Returns

The keys, value-types and the description of the values are listed below. The APs and ARs calculated with different iou thresholds, sizes of objects, and numbers of detections per image. For more details on the 12 patterns of evaluation metrics, please refer to COCO’s official evaluation page.

key

type

description

ap/iou=0.50:0.95/area=all/max_dets=100

numpy.ndarray

1

ap/iou=0.50/area=all/max_dets=100

numpy.ndarray

1

ap/iou=0.75/area=all/max_dets=100

numpy.ndarray

1

ap/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

1 5

ap/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

1 5

ap/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

1 5

ar/iou=0.50:0.95/area=all/max_dets=1

numpy.ndarray

2

ar/iou=0.50/area=all/max_dets=10

numpy.ndarray

2

ar/iou=0.75/area=all/max_dets=100

numpy.ndarray

2

ar/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

2 5

ar/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

2 5

ar/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

2 5

map/iou=0.50:0.95/area=all/max_dets=100

float

3

map/iou=0.50/area=all/max_dets=100

float

3

map/iou=0.75/area=all/max_dets=100

float

3

map/iou=0.50:0.95/area=small/max_dets=100

float

3 5

map/iou=0.50:0.95/area=medium/max_dets=100

float

3 5

map/iou=0.50:0.95/area=large/max_dets=100

float

3 5

mar/iou=0.50:0.95/area=all/max_dets=1

float

4

mar/iou=0.50/area=all/max_dets=10

float

4

mar/iou=0.75/area=all/max_dets=100

float

4

mar/iou=0.50:0.95/area=small/max_dets=100

float

4 5

mar/iou=0.50:0.95/area=medium/max_dets=100

float

4 5

mar/iou=0.50:0.95/area=large/max_dets=100

float

4 5

coco_eval

pycocotools.cocoeval.COCOeval

result from pycocotools

existent_labels

numpy.ndarray

used labels

Return type

dict

1(1,2,3,4,5,6)

An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

2(1,2,3,4,5,6)

An array of average recalls. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

3(1,2,3,4,5,6)

The average of average precisions over classes.

4(1,2,3,4,5,6)

The average of average recalls over classes.

5(1,2,3,4,5,6,7,8,9,10,11,12)

Skip if gt_areas is None.

Detection VOC

eval_detection_voc
chainercv.evaluations.eval_detection_voc(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5, use_07_metric=False)[source]

Calculate average precisions based on evaluation code of PASCAL VOC.

This function evaluates predicted bounding boxes obtained from a dataset which has \(N\) images by using average precision for each class. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters
  • pred_bboxes (iterable of numpy.ndarray) – See the table below.

  • pred_labels (iterable of numpy.ndarray) – See the table below.

  • pred_scores (iterable of numpy.ndarray) – See the table below.

  • gt_bboxes (iterable of numpy.ndarray) – See the table below.

  • gt_labels (iterable of numpy.ndarray) – See the table below.

  • gt_difficults (iterable of numpy.ndarray) – See the table below. By default, this is None. In that case, this function considers all bounding boxes to be not difficult.

  • iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.

  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

name

shape

dtype

format

pred_bboxes

\([(R, 4)]\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

pred_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

pred_scores

\([(R,)]\)

float32

gt_bboxes

\([(R, 4)]\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

gt_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

gt_difficults

\([(R,)]\)

bool

Returns

The keys, value-types and the description of the values are listed below.

  • ap (numpy.ndarray): An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

  • map (float): The average of Average Precisions over classes.

Return type

dict

calc_detection_voc_ap
chainercv.evaluations.calc_detection_voc_ap(prec, rec, use_07_metric=False)[source]

Calculate average precisions based on evaluation code of PASCAL VOC.

This function calculates average precisions from given precisions and recalls. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters
  • prec (list of numpy.array) – A list of arrays. prec[l] indicates precision for class \(l\). If prec[l] is None, this function returns numpy.nan for class \(l\).

  • rec (list of numpy.array) – A list of arrays. rec[l] indicates recall for class \(l\). If rec[l] is None, this function returns numpy.nan for class \(l\).

  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

Returns

This function returns an array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If prec[l] or rec[l] is None, the corresponding value is set to numpy.nan.

Return type

ndarray

calc_detection_voc_prec_rec
chainercv.evaluations.calc_detection_voc_prec_rec(pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5)[source]

Calculate precision and recall based on evaluation code of PASCAL VOC.

This function calculates precision and recall of predicted bounding boxes obtained from a dataset which has \(N\) images. The code is based on the evaluation code used in PASCAL VOC Challenge.

Parameters
Returns

This function returns two lists: prec and rec.

  • prec: A list of arrays. prec[l] is precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, prec[l] is set to None.

  • rec: A list of arrays. rec[l] is recall for class \(l\). If class \(l\) that is not marked as difficult does not exist in gt_labels, rec[l] is set to None.

Return type

tuple of two lists

Instance Segmentation COCO

eval_instance_segmentation_coco
chainercv.evaluations.eval_instance_segmentation_coco(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, gt_areas=None, gt_crowdeds=None)[source]

Evaluate instance segmentations based on evaluation code of MS COCO.

This function evaluates predicted instance segmentations obtained from a dataset by using average precision for each class. The code is based on the evaluation code used in MS COCO.

Parameters
  • pred_masks (iterable of numpy.ndarray) – See the table below.

  • pred_labels (iterable of numpy.ndarray) – See the table below.

  • pred_scores (iterable of numpy.ndarray) – See the table below.

  • gt_masks (iterable of numpy.ndarray) – See the table below.

  • gt_labels (iterable of numpy.ndarray) – See the table below.

  • gt_areas (iterable of numpy.ndarray) – See the table below. If None, some scores are not returned.

  • gt_crowdeds (iterable of numpy.ndarray) – See the table below.

name

shape

dtype

format

pred_masks

\([(R, H, W)]\)

bool

pred_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

pred_scores

\([(R,)]\)

float32

gt_masks

\([(R, H, W)]\)

bool

gt_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

gt_areas

\([(R,)]\)

float32

gt_crowdeds

\([(R,)]\)

bool

All inputs should have the same length. For more detailed explanation of the inputs, please refer to chainercv.datasets.COCOInstanceSegmentationDataset.

Returns

The keys, value-types and the description of the values are listed below. The APs and ARs calculated with different iou thresholds, sizes of objects, and numbers of detections per image. For more details on the 12 patterns of evaluation metrics, please refer to COCO’s official evaluation page.

key

type

description

ap/iou=0.50:0.95/area=all/max_dets=100

numpy.ndarray

6

ap/iou=0.50/area=all/max_dets=100

numpy.ndarray

6

ap/iou=0.75/area=all/max_dets=100

numpy.ndarray

6

ap/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

6 10

ap/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

6 10

ap/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

6 10

ar/iou=0.50:0.95/area=all/max_dets=1

numpy.ndarray

7

ar/iou=0.50/area=all/max_dets=10

numpy.ndarray

7

ar/iou=0.75/area=all/max_dets=100

numpy.ndarray

7

ar/iou=0.50:0.95/area=small/max_dets=100

numpy.ndarray

7 10

ar/iou=0.50:0.95/area=medium/max_dets=100

numpy.ndarray

7 10

ar/iou=0.50:0.95/area=large/max_dets=100

numpy.ndarray

7 10

map/iou=0.50:0.95/area=all/max_dets=100

float

8

map/iou=0.50/area=all/max_dets=100

float

8

map/iou=0.75/area=all/max_dets=100

float

8

map/iou=0.50:0.95/area=small/max_dets=100

float

8 10

map/iou=0.50:0.95/area=medium/max_dets=100

float

8 10

map/iou=0.50:0.95/area=large/max_dets=100

float

8 10

mar/iou=0.50:0.95/area=all/max_dets=1

float

9

mar/iou=0.50/area=all/max_dets=10

float

9

mar/iou=0.75/area=all/max_dets=100

float

9

mar/iou=0.50:0.95/area=small/max_dets=100

float

9 10

mar/iou=0.50:0.95/area=medium/max_dets=100

float

9 10

mar/iou=0.50:0.95/area=large/max_dets=100

float

9 10

coco_eval

pycocotools.cocoeval.COCOeval

result from pycocotools

existent_labels

numpy.ndarray

used labels

Return type

dict

6(1,2,3,4,5,6)

An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

7(1,2,3,4,5,6)

An array of average recalls. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

8(1,2,3,4,5,6)

The average of average precisions over classes.

9(1,2,3,4,5,6)

The average of average recalls over classes.

10(1,2,3,4,5,6,7,8,9,10,11,12)

Skip if gt_areas is None.

Instance Segmentation VOC

eval_instance_segmentation_voc
chainercv.evaluations.eval_instance_segmentation_voc(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, iou_thresh=0.5, use_07_metric=False)[source]

Calculate average precisions based on evaluation code of PASCAL VOC.

This function evaluates predicted masks obtained from a dataset which has \(N\) images by using average precision for each class. The code is based on the evaluation code used in FCIS.

Parameters
  • pred_masks (iterable of numpy.ndarray) – See the table below.

  • pred_labels (iterable of numpy.ndarray) – See the table below.

  • pred_scores (iterable of numpy.ndarray) – See the table below.

  • gt_masks (iterable of numpy.ndarray) – See the table below.

  • gt_labels (iterable of numpy.ndarray) – See the table below.

  • iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.

  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

name

shape

dtype

format

pred_masks

\([(R, H, W)]\)

bool

pred_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

pred_scores

\([(R,)]\)

float32

gt_masks

\([(R, H, W)]\)

bool

gt_labels

\([(R,)]\)

int32

\([0, \#fg\_class - 1]\)

Returns

The keys, value-types and the description of the values are listed below.

  • ap (numpy.ndarray): An array of average precisions. The \(l\)-th value corresponds to the average precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

  • map (float): The average of Average Precisions over classes.

Return type

dict

calc_instance_segmentation_voc_prec_rec
chainercv.evaluations.calc_instance_segmentation_voc_prec_rec(pred_masks, pred_labels, pred_scores, gt_masks, gt_labels, iou_thresh)[source]

Calculate precision and recall based on evaluation code of PASCAL VOC.

This function calculates precision and recall of predicted masks obtained from a dataset which has \(N\) images. The code is based on the evaluation code used in FCIS.

Parameters
  • pred_masks (iterable of numpy.ndarray) – An iterable of \(N\) sets of masks. Its index corresponds to an index for the base dataset. Each element of pred_masks is an object mask and is an array whose shape is \((R, H, W)\), where \(R\) corresponds to the number of masks, which may vary among images.

  • pred_labels (iterable of numpy.ndarray) – An iterable of labels. Similar to pred_masks, its index corresponds to an index for the base dataset. Its length is \(N\).

  • pred_scores (iterable of numpy.ndarray) – An iterable of confidence scores for predicted masks. Similar to pred_masks, its index corresponds to an index for the base dataset. Its length is \(N\).

  • gt_masks (iterable of numpy.ndarray) – An iterable of ground truth masks whose length is \(N\). An element of gt_masks is an object mask whose shape is \((R, H, W)\). Note that the number of masks \(R\) in each image does not need to be same as the number of corresponding predicted masks.

  • gt_labels (iterable of numpy.ndarray) – An iterable of ground truth labels which are organized similarly to gt_masks. Its length is \(N\).

  • iou_thresh (float) – A prediction is correct if its Intersection over Union with the ground truth is above this value.

Returns

This function returns two lists: prec and rec.

  • prec: A list of arrays. prec[l] is precision for class \(l\). If class \(l\) does not exist in either pred_labels or gt_labels, prec[l] is set to None.

  • rec: A list of arrays. rec[l] is recall for class \(l\). If class \(l\) that is not marked as difficult does not exist in gt_labels, rec[l] is set to None.

Return type

tuple of two lists

Semantic Segmentation IoU

eval_semantic_segmentation
chainercv.evaluations.eval_semantic_segmentation(pred_labels, gt_labels)[source]

Evaluate metrics used in Semantic Segmentation.

This function calculates Intersection over Union (IoU), Pixel Accuracy and Class Accuracy for the task of semantic segmentation.

The definition of metrics calculated by this function is as follows, where \(N_{ij}\) is the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

  • \(\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)

  • \(\text{mIoU} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)

  • \(\text{Pixel Accuracy} = \frac {\sum_{i=1}^k N_{ii}} {\sum_{i=1}^k \sum_{j=1}^k N_{ij}}\)

  • \(\text{Class Accuracy} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}\)

  • \(\text{Mean Class Accuracy} = \frac{1}{k} \sum_{i=1}^k \frac{N_{ii}}{\sum_{j=1}^k N_{ij}}\)

The more detailed description of the above metrics can be found in a review on semantic segmentation 11.

The number of classes \(n\_class\) is \(max(pred\_labels, gt\_labels) + 1\), which is the maximum class id of the inputs added by one.

11

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017.

Parameters
  • pred_labels (iterable of numpy.ndarray) – See the table below.

  • gt_labels (iterable of numpy.ndarray) – See the table below.

name

shape

dtype

format

pred_labels

\([(H, W)]\)

int32

\([0, \#class - 1]\)

gt_labels

\([(H, W)]\)

int32

\([-1, \#class - 1]\)

Returns

The keys, value-types and the description of the values are listed below.

  • iou (numpy.ndarray): An array of IoUs for the \(n\_class\) classes. Its shape is \((n\_class,)\).

  • miou (float): The average of IoUs over classes.

  • pixel_accuracy (float): The computed pixel accuracy.

  • class_accuracy (numpy.ndarray): An array of class accuracies for the \(n\_class\) classes. Its shape is \((n\_class,)\).

  • mean_class_accuracy (float): The average of class accuracies.

Return type

dict

calc_semantic_segmentation_confusion
chainercv.evaluations.calc_semantic_segmentation_confusion(pred_labels, gt_labels)[source]

Collect a confusion matrix.

The number of classes \(n\_class\) is \(max(pred\_labels, gt\_labels) + 1\), which is the maximum class id of the inputs added by one.

Parameters
Returns

A confusion matrix. Its shape is \((n\_class, n\_class)\). The \((i, j)\) th element corresponds to the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

Return type

numpy.ndarray

calc_semantic_segmentation_iou
chainercv.evaluations.calc_semantic_segmentation_iou(confusion)[source]

Calculate Intersection over Union with a given confusion matrix.

The definition of Intersection over Union (IoU) is as follows, where \(N_{ij}\) is the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

  • \(\text{IoU of the i-th class} = \frac{N_{ii}}{\sum_{j=1}^k N_{ij} + \sum_{j=1}^k N_{ji} - N_{ii}}\)

Parameters

confusion (numpy.ndarray) – A confusion matrix. Its shape is \((n\_class, n\_class)\). The \((i, j)\) th element corresponds to the number of pixels that are labeled as class \(i\) by the ground truth and class \(j\) by the prediction.

Returns

An array of IoUs for the \(n\_class\) classes. Its shape is \((n\_class,)\).

Return type

numpy.ndarray

Experimental

Extensions

Evaluator

DetectionCOCOEvaluator
class chainercv.extensions.DetectionCOCOEvaluator(iterator, target, label_names=None, comm=None)[source]

An extension that evaluates a detection model by MS COCO metric.

This extension iterates over an iterator and evaluates the prediction results. The results consist of average precisions (APs) and average recalls (ARs) as well as the mean of each (mean average precision and mean average recall). This extension reports the following values with keys. Please note that if label_names is not specified, only the mAPs and mARs are reported.

The underlying dataset of the iterator is assumed to return img, bbox, label or img, bbox, label, area, crowded.

key

description

ap/iou=0.50:0.95/area=all/max_dets=100/<label_names[l]>

1

ap/iou=0.50/area=all/max_dets=100/<label_names[l]>

1

ap/iou=0.75/area=all/max_dets=100/<label_names[l]>

1

ap/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]>

1 5

ap/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]>

1 5

ap/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]>

1 5

ar/iou=0.50:0.95/area=all/max_dets=1/<label_names[l]>

2

ar/iou=0.50/area=all/max_dets=10/<label_names[l]>

2

ar/iou=0.75/area=all/max_dets=100/<label_names[l]>

2

ar/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]>

2 5

ar/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]>

2 5

ar/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]>

2 5

map/iou=0.50:0.95/area=all/max_dets=100

3

map/iou=0.50/area=all/max_dets=100

3

map/iou=0.75/area=all/max_dets=100

3

map/iou=0.50:0.95/area=small/max_dets=100

3 5

map/iou=0.50:0.95/area=medium/max_dets=100

3 5

map/iou=0.50:0.95/area=large/max_dets=100

3 5

ar/iou=0.50:0.95/area=all/max_dets=1

4

ar/iou=0.50/area=all/max_dets=10

4

ar/iou=0.75/area=all/max_dets=100

4

ar/iou=0.50:0.95/area=small/max_dets=100

4 5

ar/iou=0.50:0.95/area=medium/max_dets=100

4 5

ar/iou=0.50:0.95/area=large/max_dets=100

4 5

1(1,2,3,4,5,6)

Average precision for class label_names[l], where \(l\) is the index of the class. If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

2(1,2,3,4,5,6)

Average recall for class label_names[l], where \(l\) is the index of the class. If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

3(1,2,3,4,5,6)

The average of average precisions over classes.

4(1,2,3,4,5,6)

The average of average recalls over classes.

5(1,2,3,4,5,6,7,8,9,10,11,12)

Skip if gt_areas is None.

Parameters
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, bbox, label, area, crowded.

  • target (chainer.Link) – A detection link. This link must have predict() method that takes a list of images and returns bboxes, labels and scores.

  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision and average recalls for each class are reported.

  • comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, this extension scatters the iterator of root worker and gathers the results to the root worker.

DetectionVOCEvaluator
class chainercv.extensions.DetectionVOCEvaluator(iterator, target, use_07_metric=False, label_names=None, comm=None)[source]

An extension that evaluates a detection model by PASCAL VOC metric.

This extension iterates over an iterator and evaluates the prediction results by average precisions (APs) and mean of them (mean Average Precision, mAP). This extension reports the following values with keys. Please note that 'ap/<label_names[l]>' is reported only if label_names is specified.

  • 'map': Mean of average precisions (mAP).

  • 'ap/<label_names[l]>': Average precision for class label_names[l], where \(l\) is the index of the class. For example, this evaluator reports 'ap/aeroplane', 'ap/bicycle', etc. if label_names is voc_bbox_label_names. If there is no bounding box assigned to class label_names[l] in either ground truth or prediction, it reports numpy.nan as its average precision. In this case, mAP is computed without this class.

Parameters
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, bbox, label or img, bbox, label, difficult. img is an image, bbox is coordinates of bounding boxes, label is labels of the bounding boxes and difficult is whether the bounding boxes are difficult or not. If difficult is returned, difficult ground truth will be ignored from evaluation.

  • target (chainer.Link) – A detection link. This link must have predict() method that takes a list of images and returns bboxes, labels and scores.

  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision for each class is also reported with the key 'ap/<label_names[l]>'.

  • comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, this extension scatters the iterator of root worker and gathers the results to the root worker.

InstanceSegmentationCOCOEvaluator
class chainercv.extensions.InstanceSegmentationCOCOEvaluator(iterator, target, label_names=None, comm=None)[source]

An extension that evaluates a instance segmentation model by MS COCO metric.

This extension iterates over an iterator and evaluates the prediction results. The results consist of average precisions (APs) and average recalls (ARs) as well as the mean of each (mean average precision and mean average recall). This extension reports the following values with keys. Please note that if label_names is not specified, only the mAPs and mARs are reported.

The underlying dataset of the iterator is assumed to return img, mask, label or img, mask, label, area, crowded.

key

description

ap/iou=0.50:0.95/area=all/max_dets=100/<label_names[l]>

6

ap/iou=0.50/area=all/max_dets=100/<label_names[l]>

6

ap/iou=0.75/area=all/max_dets=100/<label_names[l]>

6

ap/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]>

6 10

ap/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]>

6 10

ap/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]>

6 10

ar/iou=0.50:0.95/area=all/max_dets=1/<label_names[l]>

7

ar/iou=0.50/area=all/max_dets=10/<label_names[l]>

7

ar/iou=0.75/area=all/max_dets=100/<label_names[l]>

7

ar/iou=0.50:0.95/area=small/max_dets=100/<label_names[l]>

7 10

ar/iou=0.50:0.95/area=medium/max_dets=100/<label_names[l]>

7 10

ar/iou=0.50:0.95/area=large/max_dets=100/<label_names[l]>

7 10

map/iou=0.50:0.95/area=all/max_dets=100

8

map/iou=0.50/area=all/max_dets=100

8

map/iou=0.75/area=all/max_dets=100

8

map/iou=0.50:0.95/area=small/max_dets=100

8 10

map/iou=0.50:0.95/area=medium/max_dets=100

8 10

map/iou=0.50:0.95/area=large/max_dets=100

8 10

ar/iou=0.50:0.95/area=all/max_dets=1

9

ar/iou=0.50/area=all/max_dets=10

9

ar/iou=0.75/area=all/max_dets=100

9

ar/iou=0.50:0.95/area=small/max_dets=100

9 10

ar/iou=0.50:0.95/area=medium/max_dets=100

9 10

ar/iou=0.50:0.95/area=large/max_dets=100

9 10

6(1,2,3,4,5,6)

Average precision for class label_names[l], where \(l\) is the index of the class. If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

7(1,2,3,4,5,6)

Average recall for class label_names[l], where \(l\) is the index of the class. If class \(l\) does not exist in either pred_labels or gt_labels, the corresponding value is set to numpy.nan.

8(1,2,3,4,5,6)

The average of average precisions over classes.

9(1,2,3,4,5,6)

The average of average recalls over classes.

10(1,2,3,4,5,6,7,8,9,10,11,12)

Skip if gt_areas is None.

Parameters
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, mask, label, area, crowded.

  • target (chainer.Link) – A detection link. This link must have predict() method that takes a list of images and returns masks, labels and scores.

  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision and average recalls for each class are reported.

  • comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, this extension scatters the iterator of root worker and gathers the results to the root worker.

InstanceSegmentationVOCEvaluator
class chainercv.extensions.InstanceSegmentationVOCEvaluator(iterator, target, iou_thresh=0.5, use_07_metric=False, label_names=None, comm=None)[source]

An evaluation extension of instance-segmentation by PASCAL VOC metric.

This extension iterates over an iterator and evaluates the prediction results by average precisions (APs) and mean of them (mean Average Precision, mAP). This extension reports the following values with keys. Please note that 'ap/<label_names[l]>' is reported only if label_names is specified.

  • 'map': Mean of average precisions (mAP).

  • 'ap/<label_names[l]>': Average precision for class label_names[l], where \(l\) is the index of the class. For example, this evaluator reports 'ap/aeroplane', 'ap/bicycle', etc. if label_names is sbd_instance_segmentation_label_names. If there is no bounding box assigned to class label_names[l] in either ground truth or prediction, it reports numpy.nan as its average precision. In this case, mAP is computed without this class.

Parameters
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, bbox, label or img, bbox, label, difficult. img is an image, bbox is coordinates of bounding boxes, label is labels of the bounding boxes and difficult is whether the bounding boxes are difficult or not. If difficult is returned, difficult ground truth will be ignored from evaluation.

  • target (chainer.Link) – An instance-segmentation link. This link must have predict() method that takes a list of images and returns bboxes, labels and scores.

  • iou_thresh (float) – Intersection over Union (IoU) threshold for calulating average precision. The default value is 0.5.

  • use_07_metric (bool) – Whether to use PASCAL VOC 2007 evaluation metric for calculating average precision. The default value is False.

  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, average precision for each class is also reported with the key 'ap/<label_names[l]>'.

  • comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, this extension scatters the iterator of root worker and gathers the results to the root worker.

SemanticSegmentationEvaluator
class chainercv.extensions.SemanticSegmentationEvaluator(iterator, target, label_names=None, comm=None)[source]

An extension that evaluates a semantic segmentation model.

This extension iterates over an iterator and evaluates the prediction results of the model by common evaluation metrics for semantic segmentation. This extension reports values with keys below. Please note that 'iou/<label_names[l]>' and 'class_accuracy/<label_names[l]>' are reported only if label_names is specified.

  • 'miou': Mean of IoUs (mIoU).

  • 'iou/<label_names[l]>': IoU for class label_names[l], where \(l\) is the index of the class. For example, if label_names is camvid_label_names, this evaluator reports 'iou/Sky', 'ap/Building', etc.

  • 'mean_class_accuracy': Mean of class accuracies.

  • 'class_accuracy/<label_names[l]>': Class accuracy for class label_names[l], where \(l\) is the index of the class.

  • 'pixel_accuracy': Pixel accuracy.

If there is no label assigned to class label_names[l] in the ground truth, values corresponding to keys 'iou/<label_names[l]>' and 'class_accuracy/<label_names[l]>' are numpy.nan. In that case, the means of them are calculated by excluding them from calculation.

For details on the evaluation metrics, please see the documentation for chainercv.evaluations.eval_semantic_segmentation().

Parameters
  • iterator (chainer.Iterator) – An iterator. Each sample should be following tuple img, label. img is an image, label is pixel-wise label.

  • target (chainer.Link) – A semantic segmentation link. This link should have predict() method that takes a list of images and returns labels.

  • label_names (iterable of strings) – An iterable of names of classes. If this value is specified, IoU and class accuracy for each class are also reported with the keys 'iou/<label_names[l]>' and 'class_accuracy/<label_names[l]>'.

  • comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, this extension scatters the iterator of root worker and gathers the results to the root worker.

Visualization Report

DetectionVisReport
class chainercv.extensions.DetectionVisReport(iterator, target, label_names=None, filename='detection_iter={iteration}_idx={index}.jpg')[source]

An extension that visualizes output of a detection model.

This extension visualizes the predicted bounding boxes together with the ground truth bounding boxes.

Internally, this extension takes examples from an iterator, predict bounding boxes from the images in the examples, and visualizes them using chainercv.visualizations.vis_bbox(). The process can be illustrated in the following code.

batch = next(iterator)
# Convert batch -> imgs, gt_bboxes, gt_labels
pred_bboxes, pred_labels, pred_scores = target.predict(imgs)
# Visualization code
for img, gt_bbox, gt_label, pred_bbox, pred_label, pred_score \
        in zip(imgs, gt_boxes, gt_labels,
               pred_bboxes, pred_labels, pred_scores):
    # the ground truth
    vis_bbox(img, gt_bbox, gt_label)
    # the prediction
    vis_bbox(img, pred_bbox, pred_label, pred_score)

Note

gt_bbox and pred_bbox are float arrays of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. Each bounding box is organized by \((y_{min}, x_{min}, y_{max}, x_{max})\) in the second axis.

gt_label and pred_label are intenger arrays of shape \((R,)\). Each label indicates the class of the bounding box.

pred_score is a float array of shape \((R,)\). Each score indicates how confident the prediction is.

Parameters
  • iterator – Iterator object that produces images and ground truth.

  • target – Link object used for detection.

  • label_names (iterable of strings) – Name of labels ordered according to label ids. If this is None, labels will be skipped.

  • filename (str) – Basename for the saved image. It can contain two keywords, '{iteration}' and '{index}'. They are replaced with the iteration of the trainer and the index of the sample when this extension save an image. The default value is 'detection_iter={iteration}_idx={index}.jpg'.

Functions

Spatial Pooling

ps_roi_average_align_2d
chainercv.functions.ps_roi_average_align_2d(x, rois, roi_indices, outsize, spatial_scale, group_size, sampling_ratio=None)[source]

Position Sensitive Region of Interest (ROI) Average align function.

This function computes position sensitive average of input spatial patch with the given region of interests. Each ROI is splitted into \((group\_size, group\_size)\) regions, and position sensitive values in each region is computed.

Parameters
  • x (Variable) – Input variable. The shape is expected to be 4 dimentional: (n: batch, c: channel, h, height, w: width).

  • rois (array) – Input roi. The shape is expected to be \((R, 4)\), and each datum is set as below: (y_min, x_min, y_max, x_max). The dtype is numpy.float32.

  • roi_indices (array) – Input roi indices. The shape is expected to be \((R, )\). The dtype is numpy.int32.

  • outsize ((int, int, int) or (int, int) or int) – Expected output size after pooled: (channel, height, width) or (height, width) or outsize. outsize=o and outsize=(o, o) are equivalent. Channel parameter is used to assert the input shape.

  • spatial_scale (float) – Scale of the roi is resized.

  • group_size (int) – Position sensitive group size.

  • sampling_ratio ((int, int) or int) – Sampling step for the alignment. It must be an integer over \(1\) or None, and the value is automatically decided when None is passed. Use of different ratio in height and width axis is also supported by passing tuple of int as (sampling_ratio_h, sampling_ratio_w). sampling_ratio=s and sampling_ratio=(s, s) are equivalent.

Returns

Output variable.

Return type

Variable

See the original paper proposing PSROIPooling: R-FCN. See the original paper proposing ROIAlign: Mask R-CNN.

ps_roi_average_pooling_2d
chainercv.functions.ps_roi_average_pooling_2d(x, rois, roi_indices, outsize, spatial_scale, group_size)[source]

Position Sensitive Region of Interest (ROI) Average pooling function.

This function computes position sensitive average of input spatial patch with the given region of interests. Each ROI is splitted into \((group\_size, group\_size)\) regions, and position sensitive values in each region is computed.

Parameters
  • x (Variable) – Input variable. The shape is expected to be 4 dimentional: (n: batch, c: channel, h, height, w: width).

  • rois (array) – Input roi. The shape is expected to be \((R, 4)\), and each datum is set as below: (y_min, x_min, y_max, x_max). The dtype is numpy.float32.

  • roi_indices (array) – Input roi indices. The shape is expected to be \((R, )\). The dtype is numpy.int32.

  • outsize ((int, int, int) or (int, int) or int) – Expected output size after pooled: (channel, height, width) or (height, width) or outsize. outsize=o and outsize=(o, o) are equivalent. Channel parameter is used to assert the input shape.

  • spatial_scale (float) – Scale of the roi is resized.

  • group_size (int) – Position sensitive group size.

Returns

Output variable.

Return type

Variable

See the original paper proposing PSROIPooling: R-FCN.

ps_roi_max_align_2d
chainercv.functions.ps_roi_max_align_2d(x, rois, roi_indices, outsize, spatial_scale, group_size, sampling_ratio=None)[source]

Position Sensitive Region of Interest (ROI) Max align function.

This function computes position sensitive max value of input spatial patch with the given region of interests. Each ROI is splitted into \((group\_size, group\_size)\) regions, and position sensitive values in each region is computed.

Parameters
  • x (Variable) – Input variable. The shape is expected to be 4 dimentional: (n: batch, c: channel, h, height, w: width).

  • rois (array) – Input roi. The shape is expected to be \((R, 4)\), and each datum is set as below: (y_min, x_min, y_max, x_max). The dtype is numpy.float32.

  • roi_indices (array) – Input roi indices. The shape is expected to be \((R, )\). The dtype is numpy.int32.

  • outsize ((int, int, int) or (int, int) or int) – Expected output size after pooled: (channel, height, width) or (height, width) or outsize. outsize=o and outsize=(o, o) are equivalent. Channel parameter is used to assert the input shape.

  • spatial_scale (float) – Scale of the roi is resized.

  • group_size (int) – Position sensitive group size.

  • sampling_ratio ((int, int) or int) – Sampling step for the alignment. It must be an integer over \(1\) or None, and the value is automatically decided when None is passed. Use of different ratio in height and width axis is also supported by passing tuple of int as (sampling_ratio_h, sampling_ratio_w). sampling_ratio=s and sampling_ratio=(s, s) are equivalent.

Returns

Output variable.

Return type

Variable

See the original paper proposing PSROIPooling: R-FCN. See the original paper proposing ROIAlign: Mask R-CNN.

ps_roi_max_pooling_2d
chainercv.functions.ps_roi_max_pooling_2d(x, rois, roi_indices, outsize, spatial_scale, group_size)[source]

Position Sensitive Region of Interest (ROI) Max pooling function.

This function computes position sensitive max of input spatial patch with the given region of interests. Each ROI is splitted into \((group\_size, group\_size)\) regions, and position sensitive values in each region is computed.

Parameters
  • x (Variable) – Input variable. The shape is expected to be 4 dimentional: (n: batch, c: channel, h, height, w: width).

  • rois (array) – Input roi. The shape is expected to be \((R, 4)\), and each datum is set as below: (y_min, x_min, y_max, x_max). The dtype is numpy.float32.

  • roi_indices (array) – Input roi indices. The shape is expected to be \((R, )\). The dtype is numpy.int32.

  • outsize ((int, int, int) or (int, int) or int) – Expected output size after pooled: (channel, height, width) or (height, width) or outsize. outsize=o and outsize=(o, o) are equivalent. Channel parameter is used to assert the input shape.

  • spatial_scale (float) – Scale of the roi is resized.

  • group_size (int) – Position sensitive group size.

Returns

Output variable.

Return type

Variable

See the original paper proposing PSROIPooling: R-FCN.

Transforms

Image

center_crop
chainercv.transforms.center_crop(img, size, return_param=False, copy=False)[source]

Center crop an image by size.

An image is cropped to size. The center of the output image and the center of the input image are same.

Parameters
  • img (ndarray) – An image array to be cropped. This is in CHW format.

  • size (tuple) – The size of output image after cropping. This value is \((height, width)\).

  • return_param (bool) – If True, this function returns information of slices.

  • copy (bool) – If False, a view of img is returned.

Returns

If return_param = False, returns an array out_img that is cropped from the input array.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_slice (slice): A slice used to crop the input image. The relation below holds together with x_slice.

  • x_slice (slice): Similar to y_slice.

    out_img = img[:, y_slice, x_slice]
    

Return type

ndarray or (ndarray, dict)

flip
chainercv.transforms.flip(img, y_flip=False, x_flip=False, copy=False)[source]

Flip an image in vertical or horizontal direction as specified.

Parameters
  • img (ndarray) – An array that gets flipped. This is in CHW format.

  • y_flip (bool) – Flip in vertical direction.

  • x_flip (bool) – Flip in horizontal direction.

  • copy (bool) – If False, a view of img will be returned.

Returns

Transformed img in CHW format.

pca_lighting
chainercv.transforms.pca_lighting(img, sigma, eigen_value=None, eigen_vector=None)[source]

AlexNet style color augmentation

This method adds a noise vector drawn from a Gaussian. The direction of the Gaussian is same as that of the principal components of the dataset.

This method is used in training of AlexNet 1.

1

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.

Parameters
  • img (ndarray) – An image array to be augmented. This is in CHW and RGB format.

  • sigma (float) – Standard deviation of the Gaussian. In the original paper, this value is 10% of the range of intensity (25.5 if the range is \([0, 255]\)).

  • eigen_value (ndarray) – An array of eigen values. The shape has to be \((3,)\). If it is not specified, the values computed from ImageNet are used.

  • eigen_vector (ndarray) – An array of eigen vectors. The shape has to be \((3, 3)\). If it is not specified, the vectors computed from ImageNet are used.

Returns

An image in CHW format.

random_crop
chainercv.transforms.random_crop(img, size, return_param=False, copy=False)[source]

Crop array randomly into size.

The input image is cropped by a randomly selected region whose shape is size.

Parameters
  • img (ndarray) – An image array to be cropped. This is in CHW format.

  • size (tuple) – The size of output image after cropping. This value is \((height, width)\).

  • return_param (bool) – If True, this function returns information of slices.

  • copy (bool) – If False, a view of img is returned.

Returns

If return_param = False, returns an array out_img that is cropped from the input array.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_slice (slice): A slice used to crop the input image. The relation below holds together with x_slice.

  • x_slice (slice): Similar to x_slice.

    out_img = img[:, y_slice, x_slice]
    

Return type

ndarray or (ndarray, dict)

random_expand
chainercv.transforms.random_expand(img, max_ratio=4, fill=0, return_param=False)[source]

Expand an image randomly.

This method randomly place the input image on a larger canvas. The size of the canvas is \((rH, rW)\), where \((H, W)\) is the size of the input image and \(r\) is a random ratio drawn from \([1, max\_ratio]\). The canvas is filled by a value fill except for the region where the original image is placed.

This data augmentation trick is used to create “zoom out” effect 2.

2

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

Parameters
  • img (ndarray) – An image array to be augmented. This is in CHW format.

  • max_ratio (float) – The maximum ratio of expansion. In the original paper, this value is 4.

  • fill (float, tuple or ndarray) – The value of padded pixels. In the original paper, this value is the mean of ImageNet. If it is numpy.ndarray, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels of img.

  • return_param (bool) – Returns random parameters.

Returns

If return_param = False, returns an array out_img that is the result of expansion.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • ratio (float): The sampled value used to make the canvas.

  • y_offset (int): The y coordinate of the top left corner of the image after placing on the canvas.

  • x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.

Return type

ndarray or (ndarray, dict)

random_flip
chainercv.transforms.random_flip(img, y_random=False, x_random=False, return_param=False, copy=False)[source]

Randomly flip an image in vertical or horizontal direction.

Parameters
  • img (ndarray) – An array that gets flipped. This is in CHW format.

  • y_random (bool) – Randomly flip in vertical direction.

  • x_random (bool) – Randomly flip in horizontal direction.

  • return_param (bool) – Returns information of flip.

  • copy (bool) – If False, a view of img will be returned.

Returns

If return_param = False, returns an array out_img that is the result of flipping.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_flip (bool): Whether the image was flipped in the vertical direction or not.

  • x_flip (bool): Whether the image was flipped in the horizontal direction or not.

Return type

ndarray or (ndarray, dict)

random_rotate
chainercv.transforms.random_rotate(img, return_param=False)[source]

Randomly rotate images by 90, 180, 270 or 360 degrees.

Parameters
  • img (ndarray) – An arrays that get flipped. This is in CHW format.

  • return_param (bool) – Returns information of rotation.

Returns

If return_param = False, returns an array out_img that is the result of rotation.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • k (int): The integer that represents the number of times the image is rotated by 90 degrees.

Return type

ndarray or (ndarray, dict)

random_sized_crop
chainercv.transforms.random_sized_crop(img, scale_ratio_range=(0.08, 1), aspect_ratio_range=(0.75, 1.3333333333333333), return_param=False, copy=False)[source]

Crop an image to random size and aspect ratio.

The size \((H_{crop}, W_{crop})\) and the left top coordinate \((y_{start}, x_{start})\) of the crop are calculated as follows:

  • \(H_{crop} = \lfloor{\sqrt{s \times H \times W \times a}}\rfloor\)

  • \(W_{crop} = \lfloor{\sqrt{s \times H \times W \div a}}\rfloor\)

  • \(y_{start} \sim Uniform\{0, H - H_{crop}\}\)

  • \(x_{start} \sim Uniform\{0, W - W_{crop}\}\)

  • \(s \sim Uniform(s_1, s_2)\)

  • \(b \sim Uniform(a_1, a_2)\) and \(a = b\) or \(a = \frac{1}{b}\) in 50/50 probability.

Here, \(s_1, s_2\) are the two floats in scale_ratio_range and \(a_1, a_2\) are the two floats in aspect_ratio_range. Also, \(H\) and \(W\) are the height and the width of the image. Note that \(s \approx \frac{H_{crop} \times W_{crop}}{H \times W}\) and \(a \approx \frac{H_{crop}}{W_{crop}}\). The approximations come from flooring floats to integers.

Note

When it fails to sample a valid scale and aspect ratio for ten times, it picks values in a non-uniform way. If this happens, the selected scale ratio can be smaller than scale_ratio_range[0].

Parameters
  • img (ndarray) – An image array. This is in CHW format.

  • scale_ratio_range (tuple of two floats) – Determines the distribution from which a scale ratio is sampled. The default values are selected so that the area of the crop is 8~100% of the original image. This is the default setting used to train ResNets in Torch style.

  • aspect_ratio_range (tuple of two floats) – Determines the distribution from which an aspect ratio is sampled. The default values are \(\frac{3}{4}\) and \(\frac{4}{3}\), which are also the default setting to train ResNets in Torch style.

  • return_param (bool) – Returns parameters if True.

Returns

If return_param = False, returns only the cropped image.

If return_param = True, returns a tuple of cropped image and param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_slice (slice): A slice used to crop the input image. The relation below holds together with x_slice.

  • x_slice (slice): Similar to y_slice.

    out_img = img[:, y_slice, x_slice]
    
  • scale_ratio (float): \(s\) in the description (see above).

  • aspect_ratio (float): \(a\) in the description.

Return type

ndarray or (ndarray, dict)

resize
chainercv.transforms.resize(img, size, interpolation=2)[source]

Resize image to match the given shape.

The backend used by resize() is configured by chainer.global_config.cv_resize_backend. Two backends are supported: “cv2” and “PIL”. If this is None, “cv2” is used whenever “cv2” is installed, and “PIL” is used when “cv2” is not installed.

Parameters
  • img (ndarray) – An array to be transformed. This is in CHW format and the type should be numpy.float32.

  • size (tuple) – This is a tuple of length 2. Its elements are ordered as (height, width).

  • interpolation (int) – Determines sampling strategy. This is one of PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS. Bilinear interpolation is the default strategy.

Returns

A resize array in CHW format.

Return type

ndarray

resize_contain
chainercv.transforms.resize_contain(img, size, fill=0, interpolation=2, return_param=False)[source]

Resize the image to fit in the given area while keeping aspect ratio.

If both the height and the width in size are larger than the height and the width of the img, the img is placed on the center with an appropriate padding to match size.

Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.

Parameters
  • img (ndarray) – An array to be transformed. This is in CHW format.

  • size (tuple of two ints) – A tuple of two elements: height, width. The size of the image after resizing.

  • fill (float, tuple or ndarray) – The value of padded pixels. If it is numpy.ndarray, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels of img.

  • interpolation (int) – Determines sampling strategy. This is one of PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS. Bilinear interpolation is the default strategy.

  • return_param (bool) – Returns information of resizing and offsetting.

Returns

If return_param = False, returns an array out_img that is the result of resizing.

If return_param = True, returns a tuple whose elements are out_img, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • y_offset (int): The y coordinate of the top left corner of the image after placing on the canvas.

  • x_offset (int): The x coordinate of the top left corner of the image after placing on the canvas.

  • scaled_size (tuple): The size to which the image is scaled to before placing it on a canvas. This is a tuple of two elements: height, width.

Return type

ndarray or (ndarray, dict)

rotate
chainercv.transforms.rotate(img, angle, expand=True, fill=0, interpolation=2)[source]

Rotate images by degrees.

The backend used by rotate() is configured by chainer.global_config.cv_rotate_backend. Two backends are supported: “cv2” and “PIL”. If this is None, “cv2” is used whenever “cv2” is installed, and “PIL” is used when “cv2” is not installed.

Parameters
  • img (ndarray) – An arrays that get rotated. This is in CHW format.

  • angle (float) – Counter clock-wise rotation angle (degree).

  • expand (bool) – The output shaped is adapted or not. If True, the input image is contained complete in the output.

  • fill (float) – The value used for pixels outside the boundaries.

  • interpolation (int) – Determines sampling strategy. This is one of PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC. Bilinear interpolation is the default strategy.

Returns

returns an array out_img that is the result of rotation.

Return type

ndarray

scale
chainercv.transforms.scale(img, size, fit_short=True, interpolation=2)[source]

Rescales the input image to the given “size”.

When fit_short == True, the input image will be resized so that the shorter edge will be scaled to length size after resizing. For example, if the height of the image is larger than its width, image will be resized to (size * height / width, size).

Otherwise, the input image will be resized so that the longer edge will be scaled to length size after resizing.

Parameters
  • img (ndarray) – An image array to be scaled. This is in CHW format.

  • size (int) – The length of the smaller edge.

  • fit_short (bool) – Determines whether to match the length of the shorter edge or the longer edge to size.

  • interpolation (int) – Determines sampling strategy. This is one of PIL.Image.NEAREST, PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS. Bilinear interpolation is the default strategy.

Returns

A scaled image in CHW format.

Return type

ndarray

ten_crop
chainercv.transforms.ten_crop(img, size)[source]

Crop 10 regions from an array.

This method crops 10 regions. All regions will be in shape size. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them.

The crops are ordered in this order.

  • center crop

  • top-left crop

  • bottom-left crop

  • top-right crop

  • bottom-right crop

  • center crop (flipped horizontally)

  • top-left crop (flipped horizontally)

  • bottom-left crop (flipped horizontally)

  • top-right crop (flipped horizontally)

  • bottom-right crop (flipped horizontally)

Parameters
  • img (ndarray) – An image array to be cropped. This is in CHW format.

  • size (tuple) – The size of output images after cropping. This value is \((height, width)\).

Returns

The cropped arrays. The shape of tensor is \((10, C, H, W)\).

Bounding Box

crop_bbox
chainercv.transforms.crop_bbox(bbox, y_slice=None, x_slice=None, allow_outside_center=True, return_param=False)[source]

Translate bounding boxes to fit within the cropped area of an image.

This method is mainly used together with image cropping. This method translates the coordinates of bounding boxes like translate_bbox(). In addition, this function truncates the bounding boxes to fit within the cropped area. If a bounding box does not overlap with the cropped area, this bounding box will be removed.

Parameters
  • bbox (ndarray) – See the table below.

  • y_slice (slice) – The slice of y axis.

  • x_slice (slice) – The slice of x axis.

  • allow_outside_center (bool) – If this argument is False, bounding boxes whose centers are outside of the cropped area are removed. The default value is True.

  • return_param (bool) – If True, this function returns indices of kept bounding boxes.

name

shape

dtype

format

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

Returns

If return_param = False, returns an array bbox.

If return_param = True, returns a tuple whose elements are bbox, param. param is a dictionary of intermediate parameters whose contents are listed below with key, value-type and the description of the value.

  • index (numpy.ndarray): An array holding indices of used bounding boxes.

  • trancated_index (numpy.ndarray): An array holding indices of truncated bounding boxes, with respect to returned bbox, rather than original bbox.

Return type

ndarray or (ndarray, dict)

flip_bbox
chainercv.transforms.flip_bbox(bbox, size, y_flip=False, x_flip=False)[source]

Flip bounding boxes accordingly.

Parameters
  • bbox (ndarray) – See the table below.

  • size (tuple) – A tuple of length 2. The height and the width of the image before resized.

  • y_flip (bool) – Flip bounding box according to a vertical flip of an image.

  • x_flip (bool) – Flip bounding box according to a horizontal flip of an image.

name

shape

dtype

format

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

Returns

Bounding boxes flipped according to the given flips.

Return type

ndarray

resize_bbox
chainercv.transforms.resize_bbox(bbox, in_size, out_size)[source]

Resize bounding boxes according to image resize.

Parameters
  • bbox (ndarray) – See the table below.

  • in_size (tuple) – A tuple of length 2. The height and the width of the image before resized.

  • out_size (tuple) – A tuple of length 2. The height and the width of the image after resized.

name

shape

dtype

format

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

Returns

Bounding boxes rescaled according to the given image shapes.

Return type

ndarray

rotate_bbox
chainercv.transforms.rotate_bbox(bbox, angle, size)[source]

Rotate bounding boxes by degrees.

Parameters
  • bbox (ndarray) – See the table below.

  • angle (float) – Counter clock-wise rotation angle (degree). image is rotated by 90 degrees.

  • size (tuple) – A tuple of length 2. The height and the width of the image.

name

shape

dtype

format

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

Returns

Bounding boxes rescaled according to the given k.

Return type

ndarray

translate_bbox
chainercv.transforms.translate_bbox(bbox, y_offset=0, x_offset=0)[source]

Translate bounding boxes.

This method is mainly used together with image transforms, such as padding and cropping, which translates the left top point of the image from coordinate \((0, 0)\) to coordinate \((y, x) = (y_{offset}, x_{offset})\).

Parameters
  • bbox (ndarray) – See the table below.

  • y_offset (int or float) – The offset along y axis.

  • x_offset (int or float) – The offset along x axis.

name

shape

dtype

format

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

Returns

Bounding boxes translated according to the given offsets.

Return type

ndarray

Point

flip_point
chainercv.transforms.flip_point(point, size, y_flip=False, x_flip=False)[source]

Modify points according to image flips.

Parameters
  • point (ndarray or list of arrays) – See the table below.

  • size (tuple) – A tuple of length 2. The height and the width of the image, which is associated with the points.

  • y_flip (bool) – Modify points according to a vertical flip of an image.

  • x_flip (bool) – Modify keypoipoints according to a horizontal flip of an image.

name

shape

dtype

format

point

\((R, K, 2)\) or \([(K, 2)]\)

float32

\((y, x)\)

Returns

Points modified according to image flips.

Return type

ndarray or list of arrays

resize_point
chainercv.transforms.resize_point(point, in_size, out_size)[source]

Adapt point coordinates to the rescaled image space.

Parameters
  • point (ndarray or list of arrays) – See the table below.

  • in_size (tuple) – A tuple of length 2. The height and the width of the image before resized.

  • out_size (tuple) – A tuple of length 2. The height and the width of the image after resized.

name

shape

dtype

format

point

\((R, K, 2)\) or \([(K, 2)]\)

float32

\((y, x)\)

Returns

Points rescaled according to the given image shapes.

Return type

ndarray or list of arrays

translate_point
chainercv.transforms.translate_point(point, y_offset=0, x_offset=0)[source]

Translate points.

This method is mainly used together with image transforms, such as padding and cropping, which translates the top left point of the image to the coordinate \((y, x) = (y_{offset}, x_{offset})\).

Parameters
  • point (ndarray or list of arrays) – See the table below.

  • y_offset (int or float) – The offset along y axis.

  • x_offset (int or float) – The offset along x axis.

name

shape

dtype

format

point

\((R, K, 2)\) or \([(K, 2)]\)

float32

\((y, x)\)

Returns

Points modified translation of an image.

Return type

ndarray

Visualizations

vis_bbox

chainercv.visualizations.vis_bbox(img, bbox, label=None, score=None, label_names=None, instance_colors=None, alpha=1.0, linewidth=3.0, sort_by_score=True, ax=None)[source]

Visualize bounding boxes inside image.

Example

>>> from chainercv.datasets import VOCBboxDataset
>>> from chainercv.datasets import voc_bbox_label_names
>>> from chainercv.visualizations import vis_bbox
>>> import matplotlib.pyplot as plt
>>> dataset = VOCBboxDataset()
>>> img, bbox, label = dataset[60]
>>> vis_bbox(img, bbox, label,
...          label_names=voc_bbox_label_names)
>>> plt.show()

This example visualizes by displaying the same colors for bounding boxes assigned to the same labels.

>>> from chainercv.datasets import VOCBboxDataset
>>> from chainercv.datasets import voc_bbox_label_names
>>> from chainercv.visualizations import vis_bbox
>>> from chainercv.visualizations.colormap import voc_colormap
>>> import matplotlib.pyplot as plt
>>> dataset = VOCBboxDataset()
>>> img, bbox, label = dataset[61]
>>> colors = voc_colormap(label + 1)
>>> vis_bbox(img, bbox, label,
...          label_names=voc_bbox_label_names,
...          instance_colors=colors)
>>> plt.show()
Parameters
  • img (ndarray) – See the table below. If this is None, no image is displayed.

  • bbox (ndarray) – See the table below.

  • label (ndarray) – See the table below. This is optional.

  • score (ndarray) – See the table below. This is optional.

  • label_names (iterable of strings) – Name of labels ordered according to label ids. If this is None, labels will be skipped.

  • instance_colors (iterable of tuples) – List of colors. Each color is RGB format and the range of its values is \([0, 255]\). The i-th element is the color used to visualize the i-th instance. If instance_colors is None, the red is used for all boxes.

  • alpha (float) – The value which determines transparency of the bounding boxes. The range of this value is \([0, 1]\).

  • linewidth (float) – The thickness of the edges of the bounding boxes.

  • sort_by_score (bool) – When True, instances with high scores are always visualized in front of instances with low scores.

  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

bbox

\((R, 4)\)

float32

\((y_{min}, x_{min}, y_{max}, x_{max})\)

label

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

score

\((R,)\)

float32

Returns

Returns the Axes object with the plot for further tweaking.

Return type

Axes

vis_image

chainercv.visualizations.vis_image(img, ax=None)[source]

Visualize a color image.

Parameters
  • img (ndarray) – See the table below. If this is None, no image is displayed.

  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

Returns

Returns the Axes object with the plot for further tweaking.

Return type

Axes

vis_instance_segmentation

chainercv.visualizations.vis_instance_segmentation(img, mask, label=None, score=None, label_names=None, instance_colors=None, alpha=0.7, sort_by_score=True, ax=None)[source]

Visualize instance segmentation.

Example

This example visualizes an image and an instance segmentation.

>>> from chainercv.datasets import SBDInstanceSegmentationDataset
>>> from chainercv.datasets         ...     import sbd_instance_segmentation_label_names
>>> from chainercv.visualizations import vis_instance_segmentation
>>> import matplotlib.pyplot as plt
>>> dataset = SBDInstanceSegmentationDataset()
>>> img, mask, label = dataset[0]
>>> vis_instance_segmentation(
...     img, mask, label,
...     label_names=sbd_instance_segmentation_label_names)
>>> plt.show()

This example visualizes an image, an instance segmentation and bounding boxes.

>>> from chainercv.datasets import SBDInstanceSegmentationDataset
>>> from chainercv.datasets         ...     import sbd_instance_segmentation_label_names
>>> from chainercv.visualizations import vis_bbox
>>> from chainercv.visualizations import vis_instance_segmentation
>>> from chainercv.visualizations.colormap import voc_colormap
>>> from chainercv.utils import mask_to_bbox
>>> import matplotlib.pyplot as plt
>>> dataset = SBDInstanceSegmentationDataset()
>>> img, mask, label = dataset[0]
>>> bbox = mask_to_bbox(mask)
>>> colors = voc_colormap(list(range(1, len(mask) + 1)))
>>> ax = vis_bbox(img, bbox, label,
...     label_names=sbd_instance_segmentation_label_names,
...     instance_colors=colors, alpha=0.7, linewidth=0.5)
>>> vis_instance_segmentation(
...     None, mask, instance_colors=colors, alpha=0.7, ax=ax)
>>> plt.show()
Parameters
  • img (ndarray) – See the table below. If this is None, no image is displayed.

  • mask (ndarray) – See the table below.

  • label (ndarray) – See the table below. This is optional.

  • score (ndarray) – See the table below. This is optional.

  • label_names (iterable of strings) – Name of labels ordered according to label ids.

  • instance_colors (iterable of tuple) – List of colors. Each color is RGB format and the range of its values is \([0, 255]\). The i-th element is the color used to visualize the i-th instance. If instance_colors is None, the default color map is used.

  • alpha (float) – The value which determines transparency of the figure. The range of this value is \([0, 1]\). If this value is 0, the figure will be completely transparent. The default value is 0.7. This option is useful for overlaying the label on the source image.

  • sort_by_score (bool) – When True, instances with high scores are always visualized in front of instances with low scores.

  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

mask

\((R, H, W)\)

bool

label

\((R,)\)

int32

\([0, \#fg\_class - 1]\)

score

\((R,)\)

float32

Returns

Returns ax. ax is an matploblib.axes.Axes with the plot.

Return type

matploblib.axes.Axes

vis_point

chainercv.visualizations.vis_point(img, point, visible=None, ax=None)[source]

Visualize points in an image.

Example

>>> import chainercv
>>> import matplotlib.pyplot as plt
>>> dataset = chainercv.datasets.CUBKeypointDataset()
>>> img, point, visible = dataset[0]
>>> chainercv.visualizations.vis_point(img, point, visible)
>>> plt.show()
Parameters
  • img (ndarray) – See the table below. If this is None, no image is displayed.

  • point (ndarray or list of arrays) – See the table below.

  • visible (ndarray or list of arrays) – See the table below.

  • ax (matplotlib.axes.Axes) – If provided, plot on this axis.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

point

\((R, K, 2)\) or \([(K, 2)]\)

float32

\((y, x)\)

visible

\((R, K)\) or \([(K,)]\)

bool

Returns

Returns the Axes object with the plot for further tweaking.

Return type

Axes

vis_semantic_segmentation

chainercv.visualizations.vis_semantic_segmentation(img, label, label_names=None, label_colors=None, ignore_label_color=(0, 0, 0), alpha=1, all_label_names_in_legend=False, ax=None)[source]

Visualize a semantic segmentation.

Example

>>> from chainercv.datasets import VOCSemanticSegmentationDataset
>>> from chainercv.datasets         ...     import voc_semantic_segmentation_label_colors
>>> from chainercv.datasets         ...     import voc_semantic_segmentation_label_names
>>> from chainercv.visualizations import vis_semantic_segmentation
>>> import matplotlib.pyplot as plt
>>> dataset = VOCSemanticSegmentationDataset()
>>> img, label = dataset[60]
>>> ax, legend_handles = vis_semantic_segmentation(
...     img, label,
...     label_names=voc_semantic_segmentation_label_names,
...     label_colors=voc_semantic_segmentation_label_colors,
...     alpha=0.9)
>>> ax.legend(handles=legend_handles, bbox_to_anchor=(1, 1), loc=2)
>>> plt.show()
Parameters
  • img (ndarray) – See the table below. If this is None, no image is displayed.

  • label (ndarray) – See the table below.

  • label_names (iterable of strings) – Name of labels ordered according to label ids.

  • label_colors – (iterable of tuple): An iterable of colors for regular labels. Each color is RGB format and the range of its values is \([0, 255]\). If colors is None, the default color map is used.

  • ignore_label_color (tuple) – Color for ignored label. This is RGB format and the range of its values is \([0, 255]\). The default value is (0, 0, 0).

  • alpha (float) – The value which determines transparency of the figure. The range of this value is \([0, 1]\). If this value is 0, the figure will be completely transparent. The default value is 1. This option is useful for overlaying the label on the source image.

  • all_label_names_in_legend (bool) – Determines whether to include all label names in a legend. If this is False, the legend does not contain the names of unused labels. An unused label is defined as a label that does not appear in label. The default value is False.

  • ax (matplotlib.axes.Axis) – The visualization is displayed on this axis. If this is None (default), a new axis is created.

name

shape

dtype

format

img

\((3, H, W)\)

float32

RGB, \([0, 255]\)

label

\((H, W)\)

int32

\([-1, \#class - 1]\)

Returns

Returns ax and legend_handles. ax is an matploblib.axes.Axes with the plot. It can be used for further tweaking. legend_handles is a list of legends. It can be passed matploblib.pyplot.legend() to show a legend.

Return type

matploblib.axes.Axes and list of matplotlib.patches.Patch

Utils

Bounding Box Utilities

bbox_iou
chainercv.utils.bbox_iou(bbox_a, bbox_b)[source]

Calculate the Intersection of Unions (IoUs) between bounding boxes.

IoU is calculated as a ratio of area of the intersection and area of the union.

This function accepts both numpy.ndarray and cupy.ndarray as inputs. Please note that both bbox_a and bbox_b need to be same type. The output is same type as the type of the inputs.

Parameters
  • bbox_a (array) – An array whose shape is \((N, 4)\). \(N\) is the number of bounding boxes. The dtype should be numpy.float32.

  • bbox_b (array) – An array similar to bbox_a, whose shape is \((K, 4)\). The dtype should be numpy.float32.

Returns

An array whose shape is \((N, K)\). An element at index \((n, k)\) contains IoUs between \(n\) th bounding box in bbox_a and \(k\) th bounding box in bbox_b.

Return type

array

non_maximum_suppression
chainercv.utils.non_maximum_suppression(bbox, thresh, score=None, limit=None)[source]

Suppress bounding boxes according to their IoUs.

This method checks each bounding box sequentially and selects the bounding box if the Intersection over Unions (IoUs) between the bounding box and the previously selected bounding boxes is less than thresh. This method is mainly used as postprocessing of object detection. The bounding boxes are selected from ones with higher scores. If score is not provided as an argument, the bounding box is ordered by its index in ascending order.

The bounding boxes are expected to be packed into a two dimensional tensor of shape \((R, 4)\), where \(R\) is the number of bounding boxes in the image. The second axis represents attributes of the bounding box. They are \((y_{min}, x_{min}, y_{max}, x_{max})\), where the four attributes are coordinates of the top left and the bottom right vertices.

score is a float array of shape \((R,)\). Each score indicates confidence of prediction.

This function accepts both numpy.ndarray and cupy.ndarray as an input. Please note that both bbox and score need to be the same type. The type of the output is the same as the input.

Parameters
  • bbox (array) – Bounding boxes to be transformed. The shape is \((R, 4)\). \(R\) is the number of bounding boxes.

  • thresh (float) – Threshold of IoUs.

  • score (array) – An array of confidences whose shape is \((R,)\).

  • limit (int) – The upper bound of the number of the output bounding boxes. If it is not specified, this method selects as many bounding boxes as possible.

Returns

An array with indices of bounding boxes that are selected. They are sorted by the scores of bounding boxes in descending order. The shape of this array is \((K,)\) and its dtype is numpy.int32. Note that \(K \leq R\).

Return type

array

Download Utilities

cached_download
chainercv.utils.cached_download(url)[source]

Downloads a file and caches it.

This is different from the original cached_download() in that the download progress is reported. Note that this progress report can be disabled by setting the environment variable CHAINERCV_DOWNLOAD_REPORT to ‘OFF’.

It downloads a file from the URL if there is no corresponding cache. After the download, this function stores a cache to the directory under the dataset root (see set_dataset_root()). If there is already a cache for the given URL, it just returns the path to the cache without downloading the same file.

Parameters

url (string) – URL to download from.

Returns

Path to the downloaded file.

Return type

string

download_model
chainercv.utils.download_model(url)[source]

Downloads a model file and puts it under model directory.

It downloads a file from the URL and puts it under model directory. For exmaple, if url is http://example.com/subdir/model.npz, the pretrained weights file will be saved to $CHAINER_DATASET_ROOT/pfnet/chainercv/models/model.npz. If there is already a file at the destination path, it just returns the path without downloading the same file.

Parameters

url (string) – URL to download from.

Returns

Path to the downloaded file.

Return type

string

extractall
chainercv.utils.extractall(file_path, destination, ext)[source]

Extracts an archive file.

This function extracts an archive file to a destination.

Parameters
  • file_path (string) – The path of a file to be extracted.

  • destination (string) – A directory path. The archive file will be extracted under this directory.

  • ext (string) – An extension suffix of the archive file. This function supports '.zip', '.tar', '.gz' and '.tgz'.

Image Utilities

read_image
chainercv.utils.read_image(file, dtype=<class 'numpy.float32'>, color=True, alpha=None)[source]

Read an image from a file.

This function reads an image from given file. The image is CHW format and the range of its value is \([0, 255]\). If color = True, the order of the channels is RGB.

The backend used by read_image() is configured by chainer.global_config.cv_read_image_backend. Two backends are supported: “cv2” and “PIL”. If this is None, “cv2” is used whenever “cv2” is installed, and “PIL” is used when “cv2” is not installed.

Parameters
  • file (string or file-like object) – A path of image file or a file-like object of image.

  • dtype – The type of array. The default value is float32.

  • color (bool) – This option determines the number of channels. If True, the number of channels is three. In this case, the order of the channels is RGB. This is the default behaviour. If False, this function returns a grayscale image.

  • alpha (None or {'ignore', 'blend_with_white', 'blend_with_black'}) –

    Choose how RGBA images are handled. By default, an error is raised. Here are the other possible behaviors:

    • ’ignore’: Ignore alpha channel.

    • ’blend_with_white’: Blend RGB image multiplied by alpha on a white image.

    • ’blend_with_black’: Blend RGB image multiplied by alpha on a black image.

Returns

An image.

Return type

ndarray

read_label
chainercv.utils.read_label(file, dtype=<class 'numpy.int32'>)[source]

Read a label image from a file.

This function reads an label image from given file. If reading label doesn’t work collectly, try read_image() with a parameter color=True.

Parameters
  • file (string or file-like object) – A path of image file or a file-like object of image.

  • dtype – The type of array. The default value is int32.

  • color (bool) – This option determines the number of channels. If True, the number of channels is three. In this case, the order of the channels is RGB. This is the default behaviour. If False, this function returns a grayscale image.

Returns

An image.

Return type

ndarray

tile_images
chainercv.utils.tile_images(imgs, n_col, pad=2, fill=0)[source]

Make a tile of images

Parameters
  • imgs (numpy.ndarray) – A batch of images whose shape is BCHW.

  • n_col (int) – The number of columns in a tile.

  • pad (int or tuple of two ints) – pad_y, pad_x. This is the amounts of padding in y and x directions. If this is an integer, the amounts of padding in the two directions are the same. The default value is 2.

  • fill (float, tuple or ndarray) – The value of padded pixels. If it is numpy.ndarray, its shape should be \((C, 1, 1)\), where \(C\) is the number of channels of img.

Returns

An image array in CHW format. The size of this image is \(((H + pad_{y}) \times \lceil B / n_{n_{col}} \rceil, (W + pad_{x}) \times n_{col})\).

Return type

ndarray

write_image
chainercv.utils.write_image(img, file, format=None)[source]

Save an image to a file.

This function saves an image to given file. The image is in CHW format and the range of its value is \([0, 255]\).

Parameters
  • image (ndarray) – An image to be saved.

  • file (string or file-like object) – A path of image file or a file-like object of image.

  • format ({'bmp', 'jpeg', 'png'}) – The format of image. If file is a file-like object, this option must be specified.

Iterator Utilities

apply_to_iterator
chainercv.utils.apply_to_iterator(func, iterator, n_input=1, hook=None, comm=None)[source]

Apply a function/method to batches from an iterator.

This function applies a function/method to an iterator of batches.

It assumes that the iterator iterates over a collection of tuples that contain inputs to func(). Additionally, the tuples may contain values that are not used by func(). For convenience, we allow the iterator to iterate over a collection of inputs that are not tuple. Here is an illustration of the expected behavior of the iterator. This behaviour is the same as chainer.Iterator.

>>> batch = next(iterator)
>>> # batch: [in_val]
or
>>> # batch: [(in_val0, ..., in_val{n_input - 1})]
or
>>> # batch: [(in_val0, ..., in_val{n_input - 1}, rest_val0, ...)]

func() should take batch(es) of data and return batch(es) of computed values. Here is an illustration of the expected behavior of the function.

>>> out_vals = func([in_val0], ..., [in_val{n_input - 1}])
>>> # out_vals: [out_val]
or
>>> out_vals0, out_vals1, ... = func([in_val0], ..., [in_val{n_input - 1}])
>>> # out_vals0: [out_val0]
>>> # out_vals1: [out_val1]

With apply_to_iterator(), users can get iterator(s) of values returned by func(). It also returns iterator(s) of input values and values that are not used for computation.

>>> in_values, out_values, rest_values = apply_to_iterator(
>>>     func, iterator, n_input)
>>> # in_values: (iter of in_val0, ..., iter of in_val{n_input - 1})
>>> # out_values: (iter of out_val0, ...)
>>> # rest_values: (iter of rest_val0, ...)

Here is an exmple, which applies a pretrained Faster R-CNN to PASCAL VOC dataset.

>>> from chainer import iterators
>>>
>>> from chainercv.datasets import VOCBBoxDataset
>>> from chainercv.links import FasterRCNNVGG16
>>> from chainercv.utils import apply_to_iterator
>>>
>>> dataset = VOCBBoxDataset(year='2007', split='test')
>>> # next(iterator) -> [(img, gt_bbox, gt_label)]
>>> iterator = iterators.SerialIterator(
...     dataset, 2, repeat=False, shuffle=False)
>>>
>>> # model.predict([img]) -> ([pred_bbox], [pred_label], [pred_score])
>>> model = FasterRCNNVGG16(pretrained_model='voc07')
>>>
>>> in_values, out_values, rest_values = apply_to_iterator(
...     model.predict, iterator)
>>>
>>> # in_values contains one iterator
>>> imgs, = in_values
>>> # out_values contains three iterators
>>> pred_bboxes, pred_labels, pred_scores = out_values
>>> # rest_values contains two iterators
>>> gt_bboxes, gt_labels = rest_values
Parameters
  • func – A callable that takes batch(es) of input data and returns computed data.

  • iterator (iterator) – An iterator of batches. The first n_input elements in each sample are treated as input values. They are passed to func. If comm is specified, only the iterator of the root worker is used.

  • n_input (int) – The number of input data. The default value is 1.

  • hook – A callable that is called after each iteration. in_values, out_values, and rest_values are passed as arguments. Note that these values do not contain data from the previous iterations. If comm is specified, only the root worker executes this hook.

  • comm (CommunicatorBase) – A ChainerMN communicator. If it is specified, this function scatters the iterator of root worker and gathers the results to the root worker.

Returns

This function returns three tuples of iterators: in_values, out_values and rest_values.

  • in_values: A tuple of iterators. Each iterator returns a corresponding input value. For example, if func() takes [in_val0], [in_val1], next(in_values[0]) and next(in_values[1]) will be in_val0 and in_val1.

  • out_values: A tuple of iterators. Each iterator returns a corresponding computed value. For example, if func() returns ([out_val0], [out_val1]), next(out_values[0]) and next(out_values[1]) will be out_val0 and out_val1.

  • rest_values: A tuple of iterators. Each iterator returns a corresponding rest value. For example, if the iterator returns [(in_val0, in_val1, rest_val0, rest_val1)], next(rest_values[0]) and next(rest_values[1]) will be rest_val0 and rest_val1. If the input iterator does not give any rest values, this tuple will be empty.

Return type

Three tuples of iterators

ProgressHook
class chainercv.utils.ProgressHook(n_total=None)[source]

A hook class reporting the progress of iteration.

This is a hook class designed for apply_prediction_to_iterator().

Parameters

n_total (int) – The number of images. This argument is optional.

unzip
chainercv.utils.unzip(iterable)[source]

Converts an iterable of tuples into a tuple of iterators.

This function converts an iterable of tuples into a tuple of iterators. This is an inverse function of six.moves.zip().

>>> from chainercv.utils import unzip
>>> data = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
>>> int_iter, str_iter = unzip(data)
>>>
>>> next(int_iter)  # 0
>>> next(int_iter)  # 1
>>> next(int_iter)  # 2
>>>
>>> next(str_iter)  # 'a'
>>> next(str_iter)  # 'b'
>>> next(str_iter)  # 'c'
Parameters

iterable (iterable) – An iterable of tuples. All tuples should have the same length.

Returns

Each iterator corresponds to each element of input tuple. Note that each iterator stores values until they are popped. To reduce memory usage, it is recommended to delete unused iterators.

Return type

tuple of iterators

Mask Utilities

mask_iou
chainercv.utils.mask_iou(mask_a, mask_b)[source]

Calculate the Intersection of Unions (IoUs) between masks.

IoU is calculated as a ratio of area of the intersection and area of the union.

This function accepts both numpy.ndarray and cupy.ndarray as inputs. Please note that both mask_a and mask_b need to be same type. The output is same type as the type of the inputs.

Parameters
  • mask_a (array) – An array whose shape is \((N, H, W)\). \(N\) is the number of masks. The dtype should be numpy.bool.

  • mask_b (array) – An array similar to mask_a, whose shape is \((K, H, W)\). The dtype should be numpy.bool.

Returns

An array whose shape is \((N, K)\). An element at index \((n, k)\) contains IoUs between \(n\) th mask in mask_a and \(k\) th mask in mask_b.

Return type

array

mask_to_bbox
chainercv.utils.mask_to_bbox(mask)[source]

Compute the bounding boxes around the masked regions.

This function accepts both numpy.ndarray and cupy.ndarray as inputs.

Parameters

mask (array) – An array whose shape is \((R, H, W)\). \(R\) is the number of masks. The dtype should be numpy.bool.

Returns

The bounding boxes around the masked regions. This is an array whose shape is \((R, 4)\). \(R\) is the number of bounding boxes. The dtype should be numpy.float32.

Return type

array

scale_mask
chainercv.utils.scale_mask(mask, bbox, size)[source]

Scale instance segmentation mask while keeping the aspect ratio.

This function exploits the sparsity of mask to speed up resize operation.

The input image will be resized so that the shorter edge will be scaled to length size after resizing.

Parameters
  • mask (array) – An array whose shape is \((R, H, W)\). \(R\) is the number of masks. The dtype should be numpy.bool.

  • bbox (array) – The bounding boxes around the masked region of mask. This is expected to be the value obtained by bbox = chainercv.utils.mask_to_bbox(mask).

  • size (int) – The length of the smaller edge.

Returns

An array whose shape is \((R, H, W)\). \(R\) is the number of masks. The dtype should be numpy.bool.

Return type

array

Testing Utilities

assert_is_bbox
chainercv.utils.assert_is_bbox(bbox, size=None)[source]

Checks if bounding boxes satisfy bounding box format.

This function checks if given bounding boxes satisfy bounding boxes format or not. If the bounding boxes do not satifiy the format, this function raises an AssertionError.

Parameters
  • bbox (ndarray) – Bounding boxes to be checked.

  • size (tuple of ints) – The size of an image. If this argument is specified, Each bounding box should be within the image.

assert_is_bbox_dataset
chainercv.utils.assert_is_bbox_dataset(dataset, n_fg_class, n_example=None)[source]

Checks if a dataset satisfies the bounding box dataset API.

This function checks if a given dataset satisfies the bounding box dataset API or not. If the dataset does not satifiy the API, this function raises an AssertionError.

Parameters
  • dataset – A dataset to be checked.

  • n_fg_class (int) – The number of foreground classes.

  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.

assert_is_image
chainercv.utils.assert_is_image(img, color=True, check_range=True)[source]

Checks if an image satisfies image format.

This function checks if a given image satisfies image format or not. If the image does not satifiy the format, this function raises an AssertionError.

Parameters
  • img (ndarray) – An image to be checked.

  • color (bool) – A boolean that determines the expected channel size. If it is True, the number of channels should be 3. Otherwise, it should be 1. The default value is True.

  • check_range (bool) – A boolean that determines whether the range of values are checked or not. If it is True, The values of image must be in \([0, 255]\). Otherwise, this function does not check the range. The default value is True.

assert_is_instance_segmentation_dataset
chainercv.utils.assert_is_instance_segmentation_dataset(dataset, n_fg_class, n_example=None)[source]

Checks if a dataset satisfies instance segmentation dataset APIs.

This function checks if a given dataset satisfies instance segmentation dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an AssertionError.

Parameters
  • dataset – A dataset to be checked.

  • n_fg_class (int) – The number of foreground classes.

  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.

assert_is_label_dataset
chainercv.utils.assert_is_label_dataset(dataset, n_class, n_example=None, color=True)[source]

Checks if a dataset satisfies the label dataset API.

This function checks if a given dataset satisfies the label dataset API or not. If the dataset does not satifiy the API, this function raises an AssertionError.

Parameters
  • dataset – A dataset to be checked.

  • n_class (int) – The number of classes.

  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.

  • color (bool) – A boolean that determines the expected channel size. If it is True, the number of channels should be 3. Otherwise, it should be 1. The default value is True.

assert_is_point
chainercv.utils.assert_is_point(point, visible=None, size=None, n_point=None)[source]

Checks if points satisfy the format.

This function checks if given points satisfy the format and raises an AssertionError when the points violate the convention.

Parameters
  • point (ndarray) – Points to be checked.

  • visible (ndarray) – Visibility of the points. If this is None, all points are regarded as visible.

  • size (tuple of ints) – The size of an image. If this argument is specified, the coordinates of visible points are checked to be within the image.

  • n_point (int) – If specified, the number of points in each object is expected to be n_point.

assert_is_point_dataset
chainercv.utils.assert_is_point_dataset(dataset, n_point=None, n_example=None, no_visible=False)[source]

Checks if a dataset satisfies the point dataset API.

This function checks if a given dataset satisfies the point dataset API or not. If the dataset does not satifiy the API, this function raises an AssertionError.

Parameters
  • dataset – A dataset to be checked.

  • n_point (int) – The number of expected points per image. If this is None, the number of points per image can be arbitrary.

  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.

  • no_visible (bool) – If True, we assume that visible is always not contained. If False, :obj;`visible` may or may not be contained.

assert_is_semantic_segmentation_dataset
chainercv.utils.assert_is_semantic_segmentation_dataset(dataset, n_class, n_example=None)[source]

Checks if a dataset satisfies semantic segmentation dataset APIs.

This function checks if a given dataset satisfies semantic segmentation dataset APIs or not. If the dataset does not satifiy the APIs, this function raises an AssertionError.

Parameters
  • dataset – A dataset to be checked.

  • n_class (int) – The number of classes including background.

  • n_example (int) – The number of examples to be checked. If this argument is specified, this function picks examples ramdomly and checks them. Otherwise, this function checks all examples.

generate_random_bbox
chainercv.utils.generate_random_bbox(n, img_size, min_length, max_length)[source]

Generate valid bounding boxes with random position and shape.

Parameters
  • n (int) – The number of bounding boxes.

  • img_size (tuple) – A tuple of length 2. The height and the width of the image on which bounding boxes locate.

  • min_length (float) – The minimum length of edges of bounding boxes.

  • max_length (float) – The maximum length of edges of bounding boxes.

Returns

Coordinates of bounding boxes. Its shape is \((R, 4)\). Here, \(R\) equals n. The second axis contains \(y_{min}, x_{min}, y_{max}, x_{max}\), where \(min\_length \leq y_{max} - y_{min} < max\_length\). and \(min\_length \leq x_{max} - x_{min} < max\_length\)

Return type

numpy.ndarray

Naming Conventions

Here are the notations used.

  • \(B\) is the size of a batch.

  • \(H\) is the height of an image.

  • \(W\) is the width of an image.

  • \(C\) is the number of channels.

  • \(R\) is the total number of instances in an image.

  • \(L\) is the number of classes.

Data objects

Images

  • imgs: \((B, C, H, W)\) or \([(C, H, W)]\)

  • img: \((C, H, W)\)

Note

image is used for a name of a function or a class (e.g., chainercv.utils.write_image()).

Bounding boxes

  • bboxes: \((B, R, 4)\) or \([(R, 4)]\)

  • bbox: \((R, 4)\)

  • bb: \((4,)\)

Labels

name

classification

detection and instance segmentation

semantic segmentation

labels

\((B,)\)

\((B, R)\) or \([(R,)]\)

\((B, H, W)\)

label

\(()\)

\((R,)\)

\((H, W)\)

l

r lb

\(()\)

Scores and probabilities

score represents an unbounded confidence value. On the other hand, probability is bounded in [0, 1] and sums to 1.

name

classification

detection and instance segmentation

semantic segmentation

scores or probs

\((B, L)\)

\((B, R, L)\) or \([(R, L)]\)

\((B, L, H, W)\)

score or prob

\((L,)\)

\((R, L)\)

\((L, H, W)\)

sc or pb

\((L,)\)

Note

Even for objects that satisfy the definition of probability, they can be named as score.

Instance segmentations

  • masks: \((B, R, H, W)\) or \([(R, H, W)]\)

  • mask: \((R, H, W)\)

  • msk: \((H, W)\)

Attributing an additonal meaning to a basic data object

RoIs

  • rois: \((R', 4)\), which consists of bounding boxes for multiple images. Assuming that there are \(B\) images each containing \(R_i\) bounding boxes, the formula \(R' = \sum R_i\) is true.

  • roi_indices: An array of shape \((R',)\) that contains batch indices of images to which bounding boxes correspond.

  • roi: \((R, 4)\). This is RoIs for single image.

Attributes associated to RoIs

RoIs may have additional attributes, such as class scores and masks. These attributes are named by appending roi_ (e.g., scores-like object is named as roi_scores).

  • roi_xs: \((R',) + x_{shape}\)

  • roi_x: \((R,) + x_{shape}\)

In the case of scores with shape \((L,)\), roi_xs would have shape \((R', L)\).

Note

roi_nouns = roi_noun = noun when batchsize=1. Changing names interchangeably is fine.

Class-wise vs class-independent

cls_nouns is a multi-class version of nouns. For instance, cls_locs is \((B, R, L, 4)\) and locs is \((B, R, 4)\).

Note

cls_probs and probs can be used interchangeably in the case when there is no confusion.

Arbitrary input

x is a variable whose shape can be inferred from the context. It can be used only when there is no confusion on its shape. This is usually the case when naming an input to a neural network.

License

Source Code

The source code of ChainerCV is licensed under MIT-License.

Pretrained Models

Pretrained models provided by ChainerCV are benefited from the following resources. See the following resources for the terms of use of a model with weights pretrained by any of such resources.

model

resource

ResNet50/101/152 (imagenet)

SEResNet50/101/152 (imagenet)

SEResNeXt50/101 (imagenet)

VGG16 (imagenet)

FasterRCNNVGG16 (imagenet)

FasterRCNNVGG16 (voc07/voc0712)

SSD300/SSD512 (imagenet)

SSD300/SSD512 (voc0712)

YOLOv2 (voc0712)

YOLOv3 (voc0712)

PSPNetResNet101 (cityscapes)

SegNetBasic (camvid)

FCISResNet101 (sbd)

Indices and tables