ZODB Shoot Out

This application measures and compares the performance of various ZODB storages and configurations. It is derived from the RelStorage speedtest script, but this version allows arbitrary storage types and configurations, provides more measurements, and produces numbers that are easier to interpret.

Installation

zodbshootout can be installed using pip:

pip install zodbshootout

This will install the zodbshootout script along with ZODB and ZEO. To test other storages (such as RelStorage) or storage wrappers (such as zc.zlibstorage) you’ll need to install those packages as well.

RelStorage

zodbshootout comes with extras that install RelStorage plus an appropriate database adapter/driver for a specific database:

pip install "zodbshootout[mysql]"
pip install "zodbshootout[postgresql]"
pip install "zodbshootout[oracle]"

Note

This does not actually install the databases. You will need to install those separately (possibly using your operating system’s package manager) and create user accounts as described in the RelStorage documentation.

Tip

This does not install the packages necessary for RelStorage to integrate with Memcache. See the RelStorage documentation for more information on the packages needed to test RelStorage and Memcache.

ZEO

When zodbshootout is installed, ZEO is also installed. To test ZEO’s performance, you’ll need to have the ZEO process running, as described in the ZEO documentation.

Running zodbshootout

Executable

zodbshootout can be executed in one of two ways. The first and most common is via the zodbshootout script created by pip or buildout:

# In an active environment with zodbshootout on the path
$ zodbshootout ...arguments...
# in a non-active virtual environment
$ path/to/venv/bin/zodbshootout ...arguments...

zodbshootout can also be directly invoked as a module using the python interpreter where it is installed:

python -m zodbshootout

This documentation will simply refer to the zodbshootout script, but both forms are equivalent.

Tip

For the most repeatable, stable results, it is important to choose a fixed value for the hash seed used for Python’s builtin objects (str, bytes, etc). On CPython 2, this means not passing the -R argument to the interpreter, and not having the PYTHONHASHSEED environment variable set to random. On CPython 3, this means having the PYTHONHASHSEED environment variable set to a fixed value. On a Unix-like system, this invocation will work for both versions:

$ PYTHONHASHSEED=0 zodbshootout ...arguments...

Configuration File

The zodbshootout script requires the name of a database configuration file. The configuration file contains a list of databases to test, in ZConfig format. The script then writes and reads each of the databases while taking measurements. During this process, the measured times are output for each test of each database; there are a number of command-line options to control the output or save it to files for later analysis. (See the pyperf user guide for information on configuring the output and adjusting the benchmark process.)

An example of a configuration file testing the built-in ZODB file storage, a few variations of ZEO, and RelStorage would look like this:

# This configuration compares a database running raw FileStorage
# (no ZEO), along with a databases running FileStorage behind ZEO
# with a persistent ZEO cache, with some other databases.
#
# *This test can only run with a concurrency level of 1 if using
# multiple processes. To use higher concurrency levels, you need to
# use ``--threads``.*

%import relstorage

<zodb fs>
    <filestorage>
        path var/Data2.fs
    </filestorage>
</zodb>

<zodb zeofs_pcache>
    <zeoclient>
        server localhost:24003
        client 0
        var var
        cache-size 200000000
    </zeoclient>
</zodb>

<zodb zeo_fs>
    <zeoclient>
        server localhost:24003
    </zeoclient>
</zodb>

<zodb mysql_hf>
    <relstorage>
        keep-history false
        poll-interval 5
        <mysql>
            db relstoragetest_hf
            user relstoragetest
            passwd relstoragetest
        </mysql>
    </relstorage>
</zodb>

<zodb mysql_hf_mc>
    <relstorage>
        keep-history false
        poll-interval 5
        cache-module-name relstorage.pylibmc_wrapper
        cache-servers localhost:24005
        <mysql>
            db relstoragetest_hf
            user relstoragetest
            passwd relstoragetest
        </mysql>
    </relstorage>
</zodb>

The corresponding ZEO configuration file would look like this:

<zeo>
  address 24003
  read-only false
  invalidation-queue-size 100
  pid-filename var/zeo.pid
  # monitor-address PORT
  # transaction-timeout SECONDS
</zeo>

<filestorage 1>
  path var/Data.fs
</filestorage>

Note

If you’ll be using RelStorage, you’ll need to have the appropriate RDBMS processes installed, running, and properly configured. Likewise, if you’ll be using ZEO, you’ll need to have the ZEO server running. For pointers to more information, see Installation.

Options

The zodbshootout script accepts the following options. A description of each option follows the text output.

$ zodbshootout --help
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/zodbshootout/envs/latest/bin/zodbshootout", line 11, in <module>
    load_entry_point('zodbshootout==0.9.0.dev0', 'console_scripts', 'zodbshootout')()
  File "/home/docs/checkouts/readthedocs.org/user_builds/zodbshootout/envs/latest/lib/python2.7/site-packages/zodbshootout-0.9.0.dev0-py2.7.egg/zodbshootout/main.py", line 136, in main
    from pyperf import Runner
  File "/home/docs/checkouts/readthedocs.org/user_builds/zodbshootout/envs/latest/lib/python2.7/site-packages/pyperf-2.2.0-py2.7.egg/pyperf/__init__.py", line 1, in <module>
    from time import perf_counter
ImportError: cannot import name perf_counter

Changed in version 0.7: You can now specify just a subset of benchmarks to run by giving their names as extra command line arguments after the configuration file.

Objects

These options control the objects put in the database.

  • --object-counts specifies how many persistent objects to write or read per transaction. The default is 1000.

    Changed in version 0.7: The old alias of -n is no longer accepted; pyperf uses that to determine the number of loop iterations.

    Also, this can now only be used once.

    Changed in version 0.6: Specify this option more than once to run the tests with different object counts.

  • --btrees causes the data to be stored in the BTrees optimized for ZODB usage (without this option, a PersistentMapping will be used). This is an advanced option that may be useful when tuning particular applications and usage scenarios. This adds additional objects to manage the buckets that make up the BTree. However, if IO BTrees are used (the default when this option is specified) internal storage of keys as integers may reduce pickle times and sizes (and thus improve cache efficiency). This option can take an argument of either IO or OO to specify the type of BTree to use.

    This option is especially interesting on PyPy or when comparing the pure-Python implementation of BTrees to the C implementation.

    New in version 0.6.

  • --zap recreates the tables and indexes for a RelStorage database or a ZODB FileStorage. This option completely destroys any existing data. You will be prompted to confirm that you want to do this for each database that supports it. This is handy for comparing Python 2 and Python 3 (which can’t otherwise use the same database schemas).

    Caution

    This option destroys all data in the relevant database.

    Changed in version 0.7: You can now specify an argument of force to disable the prompt and zap all databases. You can also give a comma separated list of database names to zap; only those databases will be cleared (without prompting).

    New in version 0.6.

  • --min-objects ensures that at least the specified number of objects exist in the database independently of the objects being tested. If the database packs away objects or if --zap is used, this option will add back the necessary number of objects. If there are more objects, nothing will be done. This option is helpful for testing for scalability issues.

    New in version 0.7.

  • --blobs causes zodbshootout to read and write blobs instead of simple persistent objects. This can be useful for testing options like shared blob dirs on network filesystems, or RelStorage’s blob-chunk-size, or for diagnosing performance problems. If objects have to be added to meet the --min-objects count, they will also be blobs. Note that because of the way blobs work, there will be two times the number of objects stored as specified in --object-counts. Expect this option to cause the test to be much slower.

    New in version 0.7.

Concurrency

These options control the concurrency of the testing.

  • -c (--concurrency) specifies how many tests to run in parallel. The default is 2. Each of the concurrent tests runs in a separate process to prevent contention over the CPython global interpreter lock. In single-host configurations, the performance measurements should increase with the concurrency level, up to the number of CPU cores in the computer. In more complex configurations, performance will be limited by other factors such as network latency.

    Changed in version 0.7: This option can only be used once.

    Changed in version 0.6: Specify this option more than once to run the tests with different concurrency levels.

  • --threads uses in-process threads for concurrency instead of multiprocessing. This can demonstrate how the GIL affects various database adapters under RelStorage, for instance. It can also have demonstrate the difference that warmup time makes for things like PyPy’s JIT.

    By default or if you give the shared argument to this option, all threads will share one ZODB DB object and re-use Connections from the same pool; most threaded applications will use ZODB in this manner. If you specify the unique argument, then each thread will get its own DB object. In addition to showing how the thread locking strategy of the underlying storage affects things, this can also highlight the impact of shared caches.

    New in version 0.6.

  • --gevent monkey-patches the system and uses cooperative greenlet concurrency in a single process (like --threads, which it implies; you can specify --threads unique to change the database sharing).

    This option is only available if gevent is installed.

    Note

    Not all storage types will work properly with this option. RelStorage will, but make sure you select a gevent-compatible driver like PyMySQL or pg8000 for best results. If your driver is not compatible, you may experience timeouts and failures, including UnexpectedChildDeathError. zodbshootout attempts to compensate for this, but may not always be successful.

    New in version 0.6.

Repetitions

These options control how many times tests are repeated.

Changed in version 0.7: The old -r and --test-reps options were removed. Instead, use the --loops, --values and --processes options provided by pyperf.

Profiling

  • -p (--profile) enables the Python profiler while running the tests and outputs a profile for each test in the specified directory. Note that the profiler typically reduces the database speed by a lot. This option is intended to help developers isolate performance bottlenecks.

    New in version 0.6.

  • --leaks prints a summary of possibly leaking objects after each test repetition. This is useful for storage and ZODB developers.

    Changed in version 0.7: The old -l alias is no longer accepted.

    New in version 0.6.

Output

These options control the output produced.

Changed in version 0.7: The --dump-json argument was removed in favor of pyperf’s native output format, which enables much better analysis using pyperf show.

If the -o argument is specified, then in addition to creating a single file containing all the test runs, a file will be created for each database, allowing for direct comparisons using pyperf’s compare_to command.

  • --log enables logging to the console at the specified level. If no level is specified but this option is given, then INFO logging will be enabled. This is useful for details about the workings of a storage and the effects various options have on it.

    Changed in version 0.8: This option can also take a path to a ZConfig logging configuration file.

    New in version 0.6.

You should write a configuration file that models your intended database and network configuration. Running zodbshootout may reveal configuration optimizations that would significantly increase your application’s performance.

zodbshootout Results

The table below shows typical output of running zodbshootout with etc/sample.conf on a dual core, 2.1 GHz laptop:

"Transaction",                postgresql, mysql,   mysql_mc, zeo_fs
"Add 1000 Objects",                 6529,   10027,     9248,    5212
"Update 1000 Objects",              6754,    9012,     8064,    4393
"Read 1000 Warm Objects",           4969,    6147,    21683,    1960
"Read 1000 Cold Objects",           5041,   10554,     5095,    1920
"Read 1000 Hot Objects",           38132,   37286,    37826,   37723
"Read 1000 Steamin' Objects",    4591465, 4366792,  3339414, 4534382

zodbshootout runs six kinds of tests for each database. For each test, zodbshootout instructs all processes (or threads or greenlets, as configured) to perform similar transactions concurrently, computes the mean duration of the concurrent transactions, takes the mean timing of three test runs, and derives how many objects per second the database is capable of writing or reading under the given conditions.

zodbshootout runs these tests:

  • Add objects

    zodbshootout begins a transaction, adds the specified number of persistent objects to a PersistentMapping, or BTree and commits the transaction. In the sample output above, MySQL was able to add 10027 objects per second to the database, almost twice as fast as ZEO, which was limited to 5212 objects per second. Also, with memcached support enabled, MySQL write performance took a small hit due to the time spent storing objects in memcached.

  • Update objects

    In the same process, without clearing any caches, zodbshootout makes a simple change to each of the objects just added and commits the transaction. The sample output above shows that MySQL and ZEO typically take a little longer to update objects than to add new objects, while PostgreSQL is faster at updating objects in this case. The sample tests only history-preserving databases; you may see different results with history-free databases.

  • Read warm objects

    In a different process, without clearing any caches, zodbshootout reads all of the objects just added. This test favors databases that use either a persistent cache or a cache shared by multiple processes (such as memcached). In the sample output above, this test with MySQL and memcached runs more than ten times faster than ZEO without a persistent cache. (See fs-sample.conf for a test configuration that includes a ZEO persistent cache.)

    In shared thread mode, the database is not closed and reopened, so with concurrency greater than 1, this test is a measure of a shared pickle cache. When concurrency is 1, this test is equivalent to the steamin’ test.

  • Read cold objects

    In the same process as was used for reading warm objects, zodbshootout clears all ZODB caches (the pickle cache, the ZEO cache, and/or memcached) then reads all of the objects written by the update test. This test favors databases that read objects quickly, independently of caching. The sample output above shows that cold read time is currently a significant ZEO weakness.

  • Read prefetched cold objects

    This is just like the previous test, except the objects are prefetched using the ZODB 5 API. This demonstrates any value of bulk prefetching implemented in a database.

  • Read hot objects

    In the same process as was used for reading cold objects, zodbshootout clears the in-memory ZODB caches (the pickle cache), but leaves the other caches intact, then reads all of the objects written by the update test. This test favors databases that have a process-specific cache. In the sample output above, all of the databases have that type of cache.

  • Read steamin’ objects

    In the same process as was used for reading hot objects, zodbshootout once again reads all of the objects written by the update test. This test favors databases that take advantage of the ZODB pickle cache. As can be seen from the sample output above, accessing an object from the ZODB pickle cache is around 100 times faster than any operation that requires network access or unpickling.

Changes

0.9.0 (unreleased)

  • Add a benchmark that also updates the BTree when we’re making conflicts. This will involve readCurrent calls in the database.
  • Use custom BTree subclasses so that BTree node sizes can be adjusted.
  • Add a benchmark that generates conflicts on readCurrent objects. Workers write to their own objects, and also randomly call readCurrent on some other worker’s objects. This benchmark tests how well storages handle readCurrent conflicts together with writing.
  • Fix the --use-blobs option.

0.8.0 (2019-11-12)

  • Fix --min-objects. Previously it did nothing.
  • Add --pack to pack each storage before running.
  • Let the --log option take a path to a ZConfig logging configuration file that will be used to configure logging. This allows fine-grained control over log levels.
  • Add a benchmark (prefetch_cold) to test the effect of bulk prefetching objects into the storage cache.
  • Add a benchmark (readCurrent) to test the speed of using Connection.readCurrent (specifically, to see how well it can parallelize).
  • Add a benchmark (tpc) that explicitly (and only) tests moving through the three phases of a successful transaction commit on a storage.
  • Make pre-loading objects with --min-objects faster by using pre-serialized object data.
  • Increase the default size of objects to 300 bytes, and make it the same on Python 2 and Python 3. This closely matches the measurement of the average object size in a large production database (30 million objects).
  • Add a benchmark for allocating new OIDs. See issue #47.
  • Add a benchmark for conflict resolution, designed to emphasize parallel commit. See issue #46.
  • Add a benchmark focused just on storing new objects, eliminating the pickling and OID allocation from the timing. See issue #49.
  • Enhance the transaction commit benchmarks to show the difference between implicit commit and explicit commit. ZODB makes extra storage calls in the implicit case.
  • Add support for Python 3.8.
  • Allow excluding particular benchmarks on the command line. For example, -cold.
  • When benchmarking multiple ZODB configurations, run a particular benchmark for all databases before moving on to the next benchmark. Previously all benchmarks for a database were run before moving on to the next database. This makes it a bit easier to eyeball results as the process is running.

0.7.0 (2019-05-31)

  • Drop support for Python 3.4.
  • Add support for Python 3.7.
  • The timing loops have been rewritten on top of pyperf. This produces much more reliable/stable, meaningful data with a richer set of statistics information captured, and the ability to do analysis and comparisons on data files captured after a run is complete. Some command line options have changed as a result of this, and the output no longer is in terms of “objects per second” but how long a particular loop operation takes. See issue #37 and issue #35.
  • The timing data for in-process concurrency (gevent and threads) attempts to take the concurrency into account to produce more accurate results. See issue #38.
  • Add debug logging when we think we detect gevent-cooperative and gevent-unaware databases in gevent mode.
  • Add the ability to specify only certain subsets of benchmarks to run on the command line. In particular, if you’ve already run the add benchmark once, you can run other benchmarks such as the cold benchmark again independently as many times as you want (as long as you don’t zap the database; that’s not allowed).
  • The benchmarks more carefully verify that they tested what they wanted to. For example, they check that their Connection’s load count matches what it should be (0 in the case of the “steamin” test).
  • Profiling in gevent mode captures the entire set of data in a single file. See issue #33.
  • The --zap option accepts a force argument to eliminate the prompts. See issue #36.
  • Multi-threaded runs handle exceptions and signals more reliably. Partial fix for issue #26.
  • Shared thread read tests clear the caches of connections and the database in a more controlled way, more closely modeling the expected behaviour. Previously the cache clearing was non-deterministic. See issue #28.
  • When using gevent, use its Event and Queue implementations for better cooperation with the event loop.
  • Add --min-objects option to ensure that the underlying database has at least a set number of objects in place. This lets us test scaling issues and be more repeatable. This is tested with FileStorage, ZEO, and RelStorage (RelStorage 2.1a2 or later is needed for accurate results; earlier versions will add new objects each time, resulting in database growth).
  • Remove the unmaintained buildout configuration. See issue #25.
  • Add an option to test the performance of blob storage. See issue #29.
  • Add support for zapping file storages. See issue #43.
  • When zapping, do so right before running the ‘add’ benchmark. This ensures that the databases are all the same size even when the same underlying storage (e.g., MySQL databas) is used multiple times in a configuration. Previously, the second and further uses of the same storage would not be zapped and so would grow with the data from the previous contender tests. See issue #42.
  • Add a benchmark for empty transaction commits. This tests the storage synchronization — in RelStorage, it tests polling the RDBMS for invalidations. See issue #41.
  • Add support for using vmprof to profile, instead of cProfile. See issue #34.

0.6.0 (2016-12-13)

This is a major release that focuses on providing more options to fine tune the testing process that are expected to be useful to both deployers and storage authors.

A second major focus has been on producing more stable numeric results. As such, the results from this version are not directly comparable to results obtained from a previous version.

Platforms

  • Add support for Python 3 (3.4, 3.5 and 3.6) and PyPy. Remove support for Python 2.6 and below.
  • ZODB 4 and above are the officially supported versions. ZODB 3 is no longer tested but may still work.

Incompatible Changes

  • Remove support for Python 2.6 and below.
  • The old way of specifying concurrency levels with a comma separated list is no longer supported.

Command Line Tool

The help output and command parsing has been much improved.

  • To specify multiple concurrency levels, specify the -c option multiple times. Similarly, to specify multiple object counts, specify the -n option multiple times. (For example, -c 1 -c 2 -n 100 -n 200 would run four comparisons). The old way of separating numbers with commas is no longer supported.
  • Add the --log option to enable process logging. This is useful when using zodbshootout to understand changes in a single storage.
  • Add --zap to rebuild RelStorage schemas on startup. Useful when switching between Python 2 and Python 3.
  • The reported numbers should be more stable, thanks to running individual tests more times (via the --test-reps option) and taking the mean instead of the min.
  • Add --dump-json to write a JSON representation of more detailed data than is present in the default CSV results.

Test Additions

  • Add support for testing with BTrees (--btrees). This is especially helpful for comparing CPython and PyPy, and is also useful for understanding BTree behaviour.
  • Add support for testing using threads instead of multiprocessing (--threads). This is especially helpful on PyPy or when testing concurrency of a RelStorage database driver and/or gevent. Databases may be shared or unique for each thread.
  • Add support for setting the repetition count (--test-reps). This is especially helpful on PyPy.
  • Use randomized data for the objects instead of a constant string. This lets us more accurately model effects due to compression at the storage or network layers.
  • When gevent is installed, add support for testing with the system monkey patched (--gevent). (Note: This might not be supported by all storages.)
  • Add --leaks to use objgraph to show any leaking objects at the end of each test repetition. Most useful to storage and ZODB developers.

Other

  • Enable continuous integration testing on Travis-CI and coveralls.io.
  • Properly clear ZEO caches on ZODB5. Thanks to Jim Fulton.
  • Improve installation with pip. Extras are provided to make testing RelStorage as easy as testing FileStorage and ZEO.
  • The documentation is now hosted at http://zodbshootout.readthedocs.io/

0.5 (2012-09-08)

  • Updated to MySQL 5.1.65, PostgreSQL 9.1.5, memcached 1.4.15, and libmemcached 1.0.10.
  • Moved development to github.

0.4 (2011-02-01)

  • Added the –object-size parameter.

0.3 (2010-06-19)

  • Updated to memcached 1.4.5, libmemcached 0.40, and pylibmc 1.1+.
  • Updated to PostgreSQL 8.4.4.
  • Updated to MySQL 5.1.47 and a new download url - the old was giving 401’s.

0.2 (2009-11-17)

  • Buildout now depends on a released version of RelStorage.

0.1 (2009-11-17)

  • Initial release.

Development

https://travis-ci.org/zodb/zodbshootout.png?branch=master https://coveralls.io/repos/zodb/zodbshootout/badge.svg?branch=master&service=github Documentation Status

zodbshootout is hosted at GitHub: