pytest-nodev

Version:1.0.1
Date:2016-07-21

Test-driven source code search for Python.

New to the concept of test-driven code search? Jump to the Quickstart for a 2 minutes hands-on overview. Curious about the technique? Head over to the Concepts section or download our Starter kit. The User’s guide documents pytest-nodev usage in details and covers a few more examples.

If you have any feedback or you want to help out head over our main repository: https://github.com/nodev-io/pytest-nodev

Quickstart

New user FAQ

pytest-nodev is a simple test-driven search engine for Python code, it finds classes and functions that match the behaviour specified by the given tests.

How does “test-driven code search” work?

To be more precise pytest-nodev is a pytest plugin that lets you execute a set of tests that specify the expected behaviour of a class or a function on all objects in the Python standard library and in all the modules you have installed.

Show me how it works in practice. I need to write a parse_bool function that robustly parses a boolean value from a string. Here is the test I intend to use to validate my own implementation once I write it:

def test_parse_bool():
    assert not parse_bool('false')
    assert not parse_bool('FALSE')
    assert not parse_bool('0')

    assert parse_bool('true')
    assert parse_bool('TRUE')
    assert parse_bool('1')

First, install the latest version of pytest-nodev from the Python Package Index:

$ pip install pytest-nodev

Then copy your specification test to the test_parse_bool.py file and decorate it with pytest.mark.candidate as follows:

import pytest

@pytest.mark.candidate('parse_bool')
def test_parse_bool():
    assert not parse_bool('false')
    assert not parse_bool('FALSE')
    assert not parse_bool('0')

    assert parse_bool('true')
    assert parse_bool('TRUE')
    assert parse_bool('1')

Finally, instruct pytest to run your test on all candidate callables in the Python standard library:

$ py.test --candidates-from-stdlib test_parse_bool.py
======================= test session starts ==========================
platform darwin -- Python 3.5.1, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /tmp, inifile: setup.cfg
plugins: nodev-1.0.0, timeout-1.0.0
collected 4000 items

test_parse_bool.py xxxxxxxxxxxx[...]xxxxxxxxXxxxxxxxx[...]xxxxxxxxxxxx

====================== pytest_nodev: 1 passed ========================

test_parse_bool.py::test_parse_bool[distutils.util:strtobool] PASSED

=== 3999 xfailed, 1 xpassed, 260 pytest-warnings in 75.38 seconds ====

In just over a minute pytest-nodev collected 4000 functions from the standard library, run your specification test on all of them and reported that the strtobool function in the distutils.util module is the only candidate that passes your test.

Now you can review it and if you like it you may use it in your code. No need to write your own implementation!

Wow! Does it work so well all the times?

To be honest strtobool is a little known gem of the Python standard library that is just perfect for illustrating all the benefits of test-driven code search. Here are some of them in rough order of importance:

  • a function imported is a one less function coded—and tested, documented, debugged, ported, maintained...
  • it’s battle tested code—lot’s of old bugs have already been squashed
  • it’s other people code—there’s an upstream to report new bugs to
  • it gives you additional useful functionality—for free on top of that
  • it’s in the Python standard library—no additional dependency required

BIG FAT WARNING!

Searching code with pytest-nodev looks very much like running arbitrary callables with random arguments. A lot of functions called with the wrong set of arguments may have unexpected consequences ranging from slightly annoying, think os.mkdir('false'), to utterly catastrophic, think shutil.rmtree('/', True). Serious use of pytest-nodev, in particular using --candidates-from-all, require running the tests with operating-system level isolation, e.g. as a dedicated user or even better inside a dedicated container. The Starter kit guide documents how to run pytest-nodev safely and efficiently.

Project resources

Documentation http://pytest-nodev.readthedocs.io
Support https://stackoverflow.com/search?q=pytest-nodev
Development https://github.com/nodev-io/pytest-nodev
Discussion To be decided, see issue #15
Download https://pypi.python.org/pypi/pytest-nodev
Code quality Build Status on Travis CI Build Status on AppVeyor Coverage Status on Coveralls

Contributing

Contributions are very welcome. Please see the CONTRIBUTING document for the best way to help. If you encounter any problems, please file an issue along with a detailed description.

Authors:

Contributors:

Sponsors:

  • B-Open Solutions srl

License

pytest-nodev is free and open source software distributed under the terms of the MIT license.

Starter kit

nodev-starter-kit lets you perform test-driven code search queries with pytest-nodev safely and efficiently using docker.

Why do I need special care to run pytest-nodev?

Searching code with pytest-nodev looks very much like running arbitrary callables with random arguments. A lot of functions called with the wrong set of arguments may have unexpected consequences ranging from slightly annoying, think os.mkdir('false'), to utterly catastrophic, think shutil.rmtree('/', True). Serious use of pytest-nodev, in particular using --candidates-from-all, require running the tests with operating-system level isolation, e.g. as a dedicated user or even better inside a dedicated container.

But isn’t it docker overkill? Can’t I just use a dedicated user to run pytest-nodev?

We tried hard to find a simpler setup, but once all the nitty-gritty details are factored in we choose docker as the best trade-off between safety, reproducibility and easiness of use.

Install nodev-starter-kit

To install nodev-starter-kit clone the official repo:

$ git clone https://github.com/nodev-io/nodev-starter-kit.git
$ cd nodev-starter-kit

Advanced GitHub users are suggested to fork the offical repo and clone their fork.

Install docker-engine and docker

In order to run pytest-nodev you need to access a docker-engine server via the docker client, if you don’t have Docker already setup you need to follow the official installation instructions for your platform:

Only on Ubuntu 16.04 you can use the script we provide:

$ bash ./docker-engine-setup.sh

And test your setup with:

$ docker info

Refer to the official Docker documentation for trouble-shooting and additional configurations.

Create the nodev image

The nodev docker image will be your search engine, it needs to be created once and updated every time you want to change the packages installed in the search engine environment.

With an editor fill the requirements.txt file with the packages to be installed in the search engine.

Build the docker image with:

$ docker build -t nodev .

User’s guide

Warning

This section is work in progress and there will be areas that are lacking.

Installation

Install the latest version of pytest-nodev from the Python Package Index:

$ pip install pytest-nodev

Basic usage

Write a specification test instrumented with the candidate fixture in the test_example.py file. Run pytest with one of the --candidates-from-* options to select the search space, e.g. to search in the Python standard library:

$ py.test --candidates-from-stdlib test_example.py

Advanced usage

Use of --candidates-from-all may be very dangerous and it is disabled by default and the preferred way to search safely and efficiently is documented in the Starter kit section.

If you are sure you understand the risks and you have set up appropriate mitigation strategies you can enable --candidates-from-all by setting the PYTEST_NODEV_MODE environment variable to FEARLESS:

$ PYTEST_NODEV_MODE=FEARLESS py.test --candidates-from-all test_example.py

Command line reference

The plugin adds the following options to pytest command line:

$ py.test --help
[...]
nodev:
  --candidates-from-stdlib
                        Collects candidates form the Python standard library.
  --candidates-from-all
                        Collects candidates form the Python standard library
                        and all installed packages. Disabled by default, see
                        the docs.
  --candidates-from-specs=CANDIDATES_FROM_SPECS=[CANDIDATES_FROM_SPECS=...]
                        Collects candidates from installed packages. Space
                        separated list of `pip` specs.
  --candidates-from-modules=CANDIDATES_FROM_MODULES=[CANDIDATES_FROM_MODULES=...]
                        Collects candidates from installed modules. Space
                        separated list of module names.
  --candidates-includes=CANDIDATES_INCLUDES=[CANDIDATES_INCLUDES=...]
                        Space separated list of regexs matching full object
                        names to include, defaults to include all objects
                        collected via `--candidates-from-*`.
  --candidates-excludes=CANDIDATES_EXCLUDES=[CANDIDATES_EXCLUDES=...]
                        Space separated list of regexs matching full object
                        names to exclude.
  --candidates-predicate=CANDIDATES_PREDICATE
                        Full name of the predicate passed to
                        `inspect.getmembers`, defaults to `builtins.callable`.
  --candidates-fail     Show candidates failures.
[...]

Concepts

Motivation

“Have a look at this piece of code that I’m writing–I’m sure it has been written before. I wouldn’t be surprised to find it verbatim somewhere on GitHub.” - @kr1

Every piece of functionality in a software project requires code that lies somewhere in the wide reusability spectrum that goes form extremely custom and strongly tied to the specific implementation to completely generic and highly reusable.

On the custom side of the spectrum there is all the code that defines the features of the software and all the choices of its implementation. That one is code that need to be written.

On the other hand seasoned software developers are trained to spot pieces of functionality that lie far enough on the generic side of the range that with high probability are already implemented in a librariy or a framework and that are documented well enough to be discovered with a keyword-based search, e.g. on StackOverflow and Google.

In between the two extremes there is a huge gray area populated by pieces of functionality that are not generic enough to obviously deserve a place in a library, but are common enough that must have been already implemented by someone else for their software. This kind of code is doomed to be re-implemented again and again for the simple reason that there is no way to search code by functionality...

Or is it?

Test-driven code reuse

Test-driven reuse (TDR) is an extension of the well known test-driven development (TDD) development practice.

Developing a new feature in TDR starts with the developer writing the tests that will validate the correct implementation of the desired functionality.

Before writing any functional code the tests are run against all functions and classes of all available projects.

Any code passing the tests is presented to the developer as a candidate implementation for the target feature:

  • if nothing passes the tests the developer need to implement the feature and TDR reduces to TDD
  • if any code passes the tests the developer can:
    • import: accept code as a dependency and use the class / function directly
    • fork: copy the code and the related tests into their project
    • study: use the code and the related tests as guidelines for their implementation, in particular identifyng corner cases and optimizations

Unit tests validation

An independent use case for test-driven code search is unit tests validation. If a test passes with an unexpected object there are two possibilities, either the test is not strict enough and allows for false positives and needs to be updated, or the PASSED is actually a function you could use instead of your implementation.

Bibliography

  • “CodeGenie: a tool for test-driven source code search”, O.A. Lazzarini Lemos et al, Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, 917–918, 2007, ACM, http://dx.doi.org/10.1145/1297846.1297944
  • “Code conjurer: Pulling reusable software out of thin air”, O. Hummel et al, IEEE Software, (25) 5 45-52, 2008, IEEE, http://dx.doi.org/10.1109/MS.2008.110PDF
  • “Finding Source Code on the Web for Remix and Reuse”, S.E. Sim et al, 251, 2013PDF
  • “Test-Driven Reuse: Improving the Selection of Semantically Relevant Code”, M. Nurolahzade, Ph.D. thesis, 2014, UNIVERSITY OF CALGARY — PDF

Contributing

This project is Free and Open Source Software released under the terms of the MIT license. Contributions are highly welcomed and appreciated. Every little help counts, so do not hesitate!

Report a bug

If you encounter any problems, please file a bug report in the project issue tracker along with a detailed description.

Submit a pull request

Contributors are invited to review the product high level design and the short term product planning.

Tests can be run with pytest with:

$ py.test -v --timeout=0 --pep8 --flakes --mccabe --cov=pytest_nodev --cov-report=html \
    --cache-clear pytest_nodev tests

coverage is can be checked with:

$ open htmlcov/index.html

the complete python versions tests can be run via tox with:

$ tox

Please ensure the coverage at least stays the same before you submit a pull request.

Documentation

The documentation is in reStructuredText format, you can build a local copy with:

$ sphinx-build docs docs/html
$ open docs/html/index.html

Design

This chapter documents the high-level design of the product and it is intended for developers contributing to the project.

Note

Users of the product need not bother with the following. Unless they are curious :)

Mission and vision

The project mission is to enable test-driven code search for Python with pytest.

Target use cases:

  1. test-driven reuse
  2. tests validation

Project goals:

  1. collect all possible python live objects (modules, functions, classes, singletons, constants...)
  2. enable flexible search space definition
  3. let users turn normal tests into specification tests, and vice versa, with minimal effort

Project non-goals:

  1. protect the user from unintended consequences (clashes with goal 1.), instead document how to use OS-level isolation/containerization
  2. help users writing implementation-independent specification tests (think a contains function that also tests inside dict values and class attributes)

Software architecture

Logical components:

  • the collector of candidate objects, with filtering
  • the test runner, via the pytest plugin interface

Version goals

This project strives to adhere to semantic versioning.

1.0.0 (upcoming release)

Minimal set of features to be operationally useful and to showcase the nodev approach. Reasonably safe to test, but not safe to use without OS-level isolation. No completeness and no performance guarantees.

  • Search environment definition:
    • Support defining which modules to search. Command line --candidates-from-* options.
    • Support defining which objects to include/exclude by name or via a predicate test function. Command line --candidates-includes/excludes/predicate options.
  • Object collection:
    • Collect most objects from the defined environment. It is ok to miss some objects for now.
  • Test execution:
    • Execute tests instrumented with the candidate fixture once for every object collected. The tests are marked xfail unless the --candidates-fail command line option is given to make standard pytest reporting the most useful.
  • Report:
    • Report which objects pass each test.
  • Safety:
    • Interrupting hanging tests is delegated to pytest-timeout.
    • Internal modules and objects starting with an underscore are excluded.
    • Potentially dangerous, crashing, hard hanging or simply annoying objects belonging to the standard library are unconditionally blacklisted so that new users can test --candidates-from-stdlib without bothering with OS-level isolation.
    • Limited use of --candidates-from-all.
  • Documentation:
    • Enough to inspire and raise interest in new users.
    • Enough to use it effectively and safely. Give a strategy to get OS-level isolation.