runtest

Numerically tolerant end-to-end test library for research software.

This documents the latest code on the main branch. The release-1.3.z code is documented here: http://runtest.readthedocs.io/en/release-1.3.z/.

Motivation

Scope

When testing numerical codes against functionality regression, you typically cannot use a plain diff against the reference outputs due to numerical noise in the digits and because there may be many numbers that change all the time and that you do not want to test (e.g. date and time of execution).

The aim of this library is to make the testing and maintenance of tests easy. The library allows to extract portions of the program output(s) which are automatically compared to reference outputs with a relative or absolute numerical tolerance to compensate for numerical noise due to machine precision.

Design decisions

The library is designed to play well with CTest, to be convenient when used interactively, and to work without trouble on Linux, Mac, and Windows. It offers a basic argument parsing for test scripts.

Audience

Explain runtest in one sentence

Runtest will assist you in running an entire calculation/simulation, extracting portions for the simulation outputs, and comparing these portions with reference outputs and scream if the results have changed above a predefined numerical tolerance.

When should one use runtest?

  • You compute numerical results.
  • You want a library that understands that floating point precision is limited.
  • You want to be able to update tests by updating reference outputs.
  • You look for an end-to-end testing support.

When should one not use runtest?

  • You look for a unit test library which tests single functions. Much better alternatives exist for this.

Similar projects

General tips

How to add a new test

Test scripts are python scripts which return zero (success) or non-zero (failure). You define what success or failure means. The runtest library helps you with basic tasks but you are free to go beyond and define own tests with arbitrary complexity.

Strive for portability

Avoid shell programming or symlinks in test scripts otherwise the tests are not portable to Windows. Therefore do not use os.system() or os.symlink(). Do not use explicit forward slashes for paths, instead use os.path.join().

Always test that the test really works

It is easy to make a mistake and create a test which is always “successful”. Test that your test catches mistakes. Verify whether it extracts the right numbers.

Never commit functionality to the main development line without tests

If you commit functionality to the main development line without tests then this functionality will break sooner or later and we have no automatic mechanism to detect it. Committing new code without tests is bad karma.

Never add inputs to the test directories which are never run

We want all inputs and outputs to be accessile by the default test suite. Otherwise we have no automatic way to detect that some inputs or outputs have degraded. Degraded inputs and outputs are useless and confusing.

How to hook up runtest with your code

The runtest library is a low-level program-independent library that provides infrastructure for running calculations and extracting and comparing numbers against reference outputs. The library does not know anything about your code.

In order to tell the library how to run your code, the library requires that you define a configure function which defines how to handle a list of input files and extra arguments. This configure function also defines the launcher script or binary for your code, the full launch command, the output prefix, and relative reference path where reference outputs are stored. The output prefix can also be None.

Here is an example module runtest_config.py which defines such a function:

def configure(options, input_files, extra_args):
    """
    This function is used by runtest to configure runtest
    at runtime for code specific launch command and file naming.
    """

    from os import path
    from sys import platform

    launcher = 'pam'
    launcher_full_path = path.normpath(path.join(options.binary_dir, launcher))

    (inp, mol) = input_files

    if platform == "win32":
        exe = 'dirac.x.exe'
    else:
        exe = 'dirac.x'

    command = []
    command.append('python {0}'.format(launcher_full_path))
    command.append('--dirac={0}'.format(path.join(options.binary_dir, exe)))
    command.append('--noarch --nobackup')
    command.append('--inp={0} --mol={1}'.format(inp, mol))
    if extra_args is not None:
        command.append(extra_args)

    full_command = ' '.join(command)

    inp_no_suffix = path.splitext(inp)[0]
    mol_no_suffix = path.splitext(mol)[0]

    output_prefix = '{0}_{1}'.format(inp_no_suffix, mol_no_suffix)

    relative_reference_path = 'result'

    return launcher, full_command, output_prefix, relative_reference_path

The function is expected to return launcher, full_command, output_prefix, and relative_reference_path.

Example test script

Let us consider a relatively simple annotated example.

First we import modules that we need (highlighted lines):

#!/usr/bin/env python

# provides os.path.join
import os

# provides exit
import sys

# we make sure we can import runtest and runtest_config
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

# we import essential functions from the runtest library
from runtest import version_info, get_filter, cli, run

# this tells runtest how to run your code
from runtest_config import configure

# we stop the script if the major version is not compatible
assert version_info.major == 2

# construct a filter list which contains two filters
f = [
    get_filter(from_string='@   Elements of the electric dipole',
               to_string='@   anisotropy',
               rel_tolerance=1.0e-5),
    get_filter(from_string='************ Expectation values',
               to_string='s0 = T : Expectation value',
               rel_tolerance=1.0e-5),
]

# invoke the command line interface parser which returns options
options = cli()

ierr = 0
for inp in ['PBE0gracLB94.inp', 'GLLBsaopLBalpha.inp']:
    for mol in ['Ne.mol']:
        # the run function runs the code and filters the outputs
        ierr += run(options,
                    configure,
                    input_files=[inp, mol],
                    filters={'out': f})

sys.exit(ierr)

Then we construct a list of filters. We can construct as many lists as we like and they can contain as many filters as we like. The list does not have to be called “f”. Give it a name that is meaningful to you.

#!/usr/bin/env python

# provides os.path.join
import os

# provides exit
import sys

# we make sure we can import runtest and runtest_config
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))

# we import essential functions from the runtest library
from runtest import version_info, get_filter, cli, run

# this tells runtest how to run your code
from runtest_config import configure

# we stop the script if the major version is not compatible
assert version_info.major == 2

# construct a filter list which contains two filters
f = [
    get_filter(from_string='@   Elements of the electric dipole',
               to_string='@   anisotropy',
               rel_tolerance=1.0e-5),
    get_filter(from_string='************ Expectation values',
               to_string='s0 = T : Expectation value',
               rel_tolerance=1.0e-5),
]

# invoke the command line interface parser which returns options
options = cli()

ierr = 0
for inp in ['PBE0gracLB94.inp', 'GLLBsaopLBalpha.inp']:
    for mol in ['Ne.mol']:
        # the run function runs the code and filters the outputs
        ierr += run(options,
                    configure,
                    input_files=[inp, mol],
                    filters={'out': f})

sys.exit(ierr)

After we use the command line interface to generate options, we really run the test. Note how we pass the configure function to the run function. Also note how we pass the filter list as a dictionary. If we omit to pass it, then the calculations will be run but not verified. This is useful for multi-step jobs. From the dictionary, the library knows that it should execute the filter list “f” on output files with the suffix “out”. It is no problem to apply different filters to different output files, for this add entries to the filters dictionary.

Run function arguments

The run function has the following signature:

def run(options,
        configure,
        input_files,
        extra_args=None,
        filters=None,
        accepted_errors=None):
    ...

options is set by the command line interface (by the user executing runtest).

configure is specific to the code at hand (see the Example test script).

input_files contains the input files passed to the code launcher. The data structure of input_files is set by the configure function (in other words by the code using runtest).

There are three more optional arguments to the run function which by default are set to None:

extra_args contains extra arguments. Again, its data structure of is set by the configure function (in other words by the code using runtest).

filters is a dictionary of suffix and filter list pairs and contains filters to apply to the results. If we omit to pass it, then the calculations will be run but not verified. This is useful for multi-step jobs. See also the Example test script. If the output_prefix in the configure function is set to None, then the filters are applied to the file names literally.

Filter options

Relative tolerance

There is no default. You have to select either relative or absolute tolerance for each test when testing floats. You cannot select both at the same time.

In this example we set the relative tolerance to 1.0e-10:

get_filter(from_string='Electronic energy',
           num_lines=8,
           rel_tolerance=1.0e-10)

Absolute tolerance

There is no default. You have to select either relative or absolute tolerance for each test when testing floats. You cannot select both at the same time.

In this example we set the absolute tolerance to 1.0e-10:

get_filter(from_string='Electronic energy',
           num_lines=8,
           abs_tolerance=1.0e-10)

How to check entire file

By default all lines are tested so if you omit any string anchors and number of lines we will compare numbers from the entire file.

Example:

get_filter(rel_tolerance=1.0e-10)

Filtering between two anchor strings

Example:

get_filter(from_string='@   Elements of the electric dipole',
           to_string='@   anisotropy',
           rel_tolerance=1.0e-10)

This will extract all floats between these strings including the lines of the strings.

The start/end strings can be regular expressions, for this use from_re or to_re. Any combination containing from_string/from_re and to_string/to_re is possible.

Filtering a number of lines starting with string/regex

Example:

get_filter(from_string='Electronic energy',
           num_lines=8,  # here we compare 8 lines
           abs_tolerance=1.0e-10)

The start string can be a string (from_string) or a regular expression (from_re). In the above example we extract and compare all lines that start with ‘Electronic energy’ including the following 7 lines.

Extracting single lines

This example will compare all lines which contain ‘Electronic energy’:

get_filter(string='Electronic energy',
           abs_tolerance=1.0e-10)

This will match the string in a case-sensitive fashion.

Instead of single string we can give a single regular expression (re).

get_filter(re='Electronic energy',
           abs_tolerance=1.0e-10)

Regexes follow the Python syntax. For example, to match in a case-insensitive fashion:

get_filter(re=r'(?i)Electronic energy',
           abs_tolerance=1.0e-10)

It is not possible to use Python regex objects directly.

How to ignore sign

Sometimes the sign is not predictable. For this set ignore_sign=True.

How to ignore the order of numbers

Setting ignore_order=True will sort the numbers (as they appear consecutively between anchors, one after another) before comparing them. This is useful for tests where some numbers can change place.

How to ignore very small or very large numbers

You can ignore very small numbers with skip_below. Default is 1.0e-40. Ignore all floats that are smaller than this number (this option ignores the sign).

As an example consider the following result tensor:

3716173.43448289          0.00000264         -0.00000346
     -0.00008183      75047.79698485          0.00000328
      0.00003493         -0.00000668      75047.79698251

      0.00023164    -153158.24017016         -0.00000493
  90142.70952070         -0.00000602          0.00000574
      0.00001946         -0.00000028          0.00000052

      0.00005844         -0.00000113    -153158.24017263
     -0.00005667          0.00000015         -0.00000022
  90142.70952022          0.00000056          0.00000696

The small numbers are actually numerical noise and we do not want to test them at all. In this case it is useful to set skip_below=1.0e-4.

Alternatively one could use absolute tolerance to avoid checking the noisy zeros.

You can ignore very large numbers with skip_above (also this option ignores the sign).

How to ignore certain numbers

The keyword mask is useful if you extract lines which contain both interesting and uninteresting numbers (like timings which change from run to run).

Example:

get_filter(from_string='no.    eigenvalue (eV)   mean-res.',
           num_lines=4,
           rel_tolerance=1.0e-4,
           mask=[1, 2, 3])

Here we use only the first 3 floats in each line. Counting starts with 1.

Command-line arguments

-h, –help

Show help message and exit.

-b BINARY_DIR, –binary-dir=BINARY_DIR

Directory containing the binary/launcher. By default it is the directory of the test script which is executed.

-w WORK_DIR, –work-dir=WORK_DIR

Working directory where all generated files will be written to. By default it is the directory of the test script which is executed.

-l LAUNCH_AGENT, –launch-agent=LAUNCH_AGENT

Prepend a launch agent command (e.g. “mpirun -np 8” or “valgrind –leak-check=yes”). By default no launch agent is prepended.

-v, –verbose

Give more verbose output upon test failure (by default False).

-s, –skip-run

Skip actual calculation(s), only compare numbers. This is useful to adjust the test script for long calculations.

-n, –no-verification

Run calculation(s) but do not verify results. This is useful to generate outputs for the first time.

Generated files

The test script generates three files per run with the suffixes “.diff”, “.filtered”, and “.reference”.

The “.filtered” file contains the extracted numbers from the present run.

The “.reference” file contains the extracted numbers from the reference file.

If the test passes, the “.diff” file is an empty file. If the test fails, it contains information about the difference between the present run and the reference file.

Contributing

Yes please! Please follow this excellent guide: http://www.contribution-guide.org. We do not require any formal copyright assignment or contributor license agreement. Any contributions intentionally sent upstream are presumed to be offered under terms of the Mozilla Public License Version 2.0.

Methods, and variables that start with underscore are private.

Please keep the default output as silent as possible.

Where to contribute

Here are some ideas:

  • Improve documentation
  • Fix typos
  • Make it possible to install this package using pip
  • Make this package distributable via PyPI

Branching model

We follow the semantic branching model: https://dev-cafe.github.io/branching-model/