ZODB Shoot Out¶
This application measures and compares the performance of various ZODB storages and configurations. It is derived from the RelStorage speedtest script, but this version allows arbitrary storage types and configurations, provides more measurements, and produces numbers that are easier to interpret.
Installation¶
zodbshootout
can be installed using pip
:
pip install zodbshootout
This will install the zodbshootout script along
with ZODB and ZEO. To test other storages (such as RelStorage
) or
storage wrappers (such as zc.zlibstorage
) you’ll need to install
those packages as well.
RelStorage¶
zodbshootout
comes with extras that install RelStorage plus an
appropriate database adapter/driver for a specific database:
pip install "zodbshootout[mysql]"
pip install "zodbshootout[postgresql]"
pip install "zodbshootout[oracle]"
Note
This does not actually install the databases. You will need to install those separately (possibly using your operating system’s package manager) and create user accounts as described in the RelStorage documentation.
Tip
This does not install the packages necessary for RelStorage to integrate with Memcache. See the RelStorage documentation for more information on the packages needed to test RelStorage and Memcache.
ZEO¶
When zodbshootout
is installed, ZEO is also installed. To test
ZEO’s performance, you’ll need to have the ZEO process running, as
described in the ZEO documentation.
Running zodbshootout
¶
Executable¶
zodbshootout
can be executed in one of two ways. The first and
most common is via the zodbshootout
script created by pip or
buildout:
# In an active environment with zodbshootout on the path
$ zodbshootout ...arguments...
# in a non-active virtual environment
$ path/to/venv/bin/zodbshootout ...arguments...
zodbshootout
can also be directly invoked as a module using the
python interpreter where it is installed:
python -m zodbshootout
This documentation will simply refer to the zodbshootout
script,
but both forms are equivalent.
Tip
For the most repeatable, stable results, it is important to choose
a fixed value for the hash seed used for Python’s builtin objects
(str, bytes, etc). On CPython 2, this means not passing the -R
argument to the interpreter, and not having the PYTHONHASHSEED
environment variable set to random
. On CPython 3, this means
having the PYTHONHASHSEED
environment variable set to a fixed value. On a Unix-like system,
this invocation will work for both versions:
$ PYTHONHASHSEED=0 zodbshootout ...arguments...
Configuration File¶
The zodbshootout
script requires the name of a database
configuration file. The configuration file contains a list of
databases to test, in ZConfig format. The script then writes and reads
each of the databases while taking measurements. During this process,
the measured times are output for each test of each database; there
are a number of command-line options to control the output or save it
to files for later analysis. (See the pyperf user guide for
information on configuring the output and adjusting the benchmark
process.)
An example of a configuration file testing the built-in ZODB file storage, a few variations of ZEO, and RelStorage would look like this:
# This configuration compares a database running raw FileStorage
# (no ZEO), along with a databases running FileStorage behind ZEO
# with a persistent ZEO cache, with some other databases.
#
# *This test can only run with a concurrency level of 1 if using
# multiple processes. To use higher concurrency levels, you need to
# use ``--threads``.*
%import relstorage
<zodb fs>
<filestorage>
path var/Data2.fs
</filestorage>
</zodb>
<zodb zeofs_pcache>
<zeoclient>
server localhost:24003
client 0
var var
cache-size 200000000
</zeoclient>
</zodb>
<zodb zeo_fs>
<zeoclient>
server localhost:24003
</zeoclient>
</zodb>
<zodb mysql_hf>
<relstorage>
keep-history false
poll-interval 5
<mysql>
db relstoragetest_hf
user relstoragetest
passwd relstoragetest
</mysql>
</relstorage>
</zodb>
<zodb mysql_hf_mc>
<relstorage>
keep-history false
poll-interval 5
cache-module-name relstorage.pylibmc_wrapper
cache-servers localhost:24005
<mysql>
db relstoragetest_hf
user relstoragetest
passwd relstoragetest
</mysql>
</relstorage>
</zodb>
The corresponding ZEO configuration file would look like this:
<zeo>
address 24003
read-only false
invalidation-queue-size 100
pid-filename var/zeo.pid
# monitor-address PORT
# transaction-timeout SECONDS
</zeo>
<filestorage 1>
path var/Data.fs
</filestorage>
Note
If you’ll be using RelStorage, you’ll need to have the appropriate RDBMS processes installed, running, and properly configured. Likewise, if you’ll be using ZEO, you’ll need to have the ZEO server running. For pointers to more information, see Installation.
Options¶
The zodbshootout
script accepts the following options. A
description of each option follows the text output.
$ zodbshootout --help
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/zodbshootout/envs/latest/bin/zodbshootout", line 11, in <module>
load_entry_point('zodbshootout==0.9.0.dev0', 'console_scripts', 'zodbshootout')()
File "/home/docs/checkouts/readthedocs.org/user_builds/zodbshootout/envs/latest/lib/python2.7/site-packages/zodbshootout-0.9.0.dev0-py2.7.egg/zodbshootout/main.py", line 136, in main
from pyperf import Runner
File "/home/docs/checkouts/readthedocs.org/user_builds/zodbshootout/envs/latest/lib/python2.7/site-packages/pyperf-2.2.0-py2.7.egg/pyperf/__init__.py", line 1, in <module>
from time import perf_counter
ImportError: cannot import name perf_counter
Changed in version 0.7: You can now specify just a subset of benchmarks to run by giving their names as extra command line arguments after the configuration file.
Objects¶
These options control the objects put in the database.
--object-counts
specifies how many persistent objects to write or read per transaction. The default is 1000.Changed in version 0.7: The old alias of
-n
is no longer accepted; pyperf uses that to determine the number of loop iterations.Also, this can now only be used once.
Changed in version 0.6: Specify this option more than once to run the tests with different object counts.
--btrees
causes the data to be stored in the BTrees optimized for ZODB usage (without this option, a PersistentMapping will be used). This is an advanced option that may be useful when tuning particular applications and usage scenarios. This adds additional objects to manage the buckets that make up the BTree. However, if IO BTrees are used (the default when this option is specified) internal storage of keys as integers may reduce pickle times and sizes (and thus improve cache efficiency). This option can take an argument of either IO or OO to specify the type of BTree to use.This option is especially interesting on PyPy or when comparing the pure-Python implementation of BTrees to the C implementation.
New in version 0.6.
--zap
recreates the tables and indexes for a RelStorage database or a ZODB FileStorage. This option completely destroys any existing data. You will be prompted to confirm that you want to do this for each database that supports it. This is handy for comparing Python 2 and Python 3 (which can’t otherwise use the same database schemas).Caution
This option destroys all data in the relevant database.
Changed in version 0.7: You can now specify an argument of
force
to disable the prompt and zap all databases. You can also give a comma separated list of database names to zap; only those databases will be cleared (without prompting).New in version 0.6.
--min-objects
ensures that at least the specified number of objects exist in the database independently of the objects being tested. If the database packs away objects or if--zap
is used, this option will add back the necessary number of objects. If there are more objects, nothing will be done. This option is helpful for testing for scalability issues.New in version 0.7.
--blobs
causes zodbshootout to read and write blobs instead of simple persistent objects. This can be useful for testing options like shared blob dirs on network filesystems, or RelStorage’s blob-chunk-size, or for diagnosing performance problems. If objects have to be added to meet the--min-objects
count, they will also be blobs. Note that because of the way blobs work, there will be two times the number of objects stored as specified in--object-counts
. Expect this option to cause the test to be much slower.New in version 0.7.
Concurrency¶
These options control the concurrency of the testing.
-c
(--concurrency
) specifies how many tests to run in parallel. The default is 2. Each of the concurrent tests runs in a separate process to prevent contention over the CPython global interpreter lock. In single-host configurations, the performance measurements should increase with the concurrency level, up to the number of CPU cores in the computer. In more complex configurations, performance will be limited by other factors such as network latency.Changed in version 0.7: This option can only be used once.
Changed in version 0.6: Specify this option more than once to run the tests with different concurrency levels.
--threads
uses in-process threads for concurrency instead of multiprocessing. This can demonstrate how the GIL affects various database adapters under RelStorage, for instance. It can also have demonstrate the difference that warmup time makes for things like PyPy’s JIT.By default or if you give the
shared
argument to this option, all threads will share one ZODB DB object and re-use Connections from the same pool; most threaded applications will use ZODB in this manner. If you specify theunique
argument, then each thread will get its own DB object. In addition to showing how the thread locking strategy of the underlying storage affects things, this can also highlight the impact of shared caches.New in version 0.6.
--gevent
monkey-patches the system and uses cooperative greenlet concurrency in a single process (like--threads
, which it implies; you can specify--threads unique
to change the database sharing).This option is only available if gevent is installed.
Note
Not all storage types will work properly with this option. RelStorage will, but make sure you select a gevent-compatible driver like PyMySQL or pg8000 for best results. If your driver is not compatible, you may experience timeouts and failures, including
UnexpectedChildDeathError
. zodbshootout attempts to compensate for this, but may not always be successful.New in version 0.6.
Repetitions¶
These options control how many times tests are repeated.
Changed in version 0.7: The old -r
and --test-reps
options were removed. Instead,
use the --loops
, --values
and --processes
options
provided by pyperf.
Profiling¶
-p
(--profile
) enables the Python profiler while running the tests and outputs a profile for each test in the specified directory. Note that the profiler typically reduces the database speed by a lot. This option is intended to help developers isolate performance bottlenecks.New in version 0.6.
--leaks
prints a summary of possibly leaking objects after each test repetition. This is useful for storage and ZODB developers.Changed in version 0.7: The old
-l
alias is no longer accepted.New in version 0.6.
Output¶
These options control the output produced.
Changed in version 0.7: The --dump-json
argument was removed in favor of pyperf’s
native output format, which enables much better analysis using
pyperf show
.
If the -o
argument is specified, then in addition to creating a
single file containing all the test runs, a file will be created
for each database, allowing for direct comparisons using pyperf’s
compare_to
command.
--log
enables logging to the console at the specified level. If no level is specified but this option is given, then INFO logging will be enabled. This is useful for details about the workings of a storage and the effects various options have on it.Changed in version 0.8: This option can also take a path to a ZConfig logging configuration file.
New in version 0.6.
You should write a configuration file that models your intended
database and network configuration. Running zodbshootout
may reveal
configuration optimizations that would significantly increase your
application’s performance.
zodbshootout
Results¶
The table below shows typical output of running zodbshootout
with
etc/sample.conf
on a dual core, 2.1 GHz laptop:
"Transaction", postgresql, mysql, mysql_mc, zeo_fs
"Add 1000 Objects", 6529, 10027, 9248, 5212
"Update 1000 Objects", 6754, 9012, 8064, 4393
"Read 1000 Warm Objects", 4969, 6147, 21683, 1960
"Read 1000 Cold Objects", 5041, 10554, 5095, 1920
"Read 1000 Hot Objects", 38132, 37286, 37826, 37723
"Read 1000 Steamin' Objects", 4591465, 4366792, 3339414, 4534382
zodbshootout
runs six kinds of tests for each database. For each
test, zodbshootout
instructs all processes (or threads or
greenlets, as configured) to perform similar transactions
concurrently, computes the mean duration of the concurrent
transactions, takes the mean timing of three test runs, and derives
how many objects per second the database is capable of writing or
reading under the given conditions.
zodbshootout
runs these tests:
Add objects
zodbshootout
begins a transaction, adds the specified number of persistent objects to aPersistentMapping
, orBTree
and commits the transaction. In the sample output above, MySQL was able to add 10027 objects per second to the database, almost twice as fast as ZEO, which was limited to 5212 objects per second. Also, with memcached support enabled, MySQL write performance took a small hit due to the time spent storing objects in memcached.Update objects
In the same process, without clearing any caches,
zodbshootout
makes a simple change to each of the objects just added and commits the transaction. The sample output above shows that MySQL and ZEO typically take a little longer to update objects than to add new objects, while PostgreSQL is faster at updating objects in this case. The sample tests only history-preserving databases; you may see different results with history-free databases.Read warm objects
In a different process, without clearing any caches,
zodbshootout
reads all of the objects just added. This test favors databases that use either a persistent cache or a cache shared by multiple processes (such as memcached). In the sample output above, this test with MySQL and memcached runs more than ten times faster than ZEO without a persistent cache. (Seefs-sample.conf
for a test configuration that includes a ZEO persistent cache.)In shared thread mode, the database is not closed and reopened, so with concurrency greater than 1, this test is a measure of a shared pickle cache. When concurrency is 1, this test is equivalent to the steamin’ test.
Read cold objects
In the same process as was used for reading warm objects,
zodbshootout
clears all ZODB caches (the pickle cache, the ZEO cache, and/or memcached) then reads all of the objects written by the update test. This test favors databases that read objects quickly, independently of caching. The sample output above shows that cold read time is currently a significant ZEO weakness.Read prefetched cold objects
This is just like the previous test, except the objects are prefetched using the ZODB 5 API. This demonstrates any value of bulk prefetching implemented in a database.
Read hot objects
In the same process as was used for reading cold objects,
zodbshootout
clears the in-memory ZODB caches (the pickle cache), but leaves the other caches intact, then reads all of the objects written by the update test. This test favors databases that have a process-specific cache. In the sample output above, all of the databases have that type of cache.Read steamin’ objects
In the same process as was used for reading hot objects,
zodbshootout
once again reads all of the objects written by the update test. This test favors databases that take advantage of the ZODB pickle cache. As can be seen from the sample output above, accessing an object from the ZODB pickle cache is around 100 times faster than any operation that requires network access or unpickling.
Changes¶
0.9.0 (unreleased)¶
- Add a benchmark that also updates the BTree when we’re making conflicts.
This will involve
readCurrent
calls in the database. - Use custom BTree subclasses so that BTree node sizes can be adjusted.
- Add a benchmark that generates conflicts on
readCurrent
objects. Workers write to their own objects, and also randomly callreadCurrent
on some other worker’s objects. This benchmark tests how well storages handlereadCurrent
conflicts together with writing. - Fix the
--use-blobs
option.
0.8.0 (2019-11-12)¶
- Fix
--min-objects
. Previously it did nothing. - Add
--pack
to pack each storage before running. - Let the
--log
option take a path to a ZConfig logging configuration file that will be used to configure logging. This allows fine-grained control over log levels. - Add a benchmark (
prefetch_cold
) to test the effect of bulk prefetching objects into the storage cache. - Add a benchmark (
readCurrent
) to test the speed of usingConnection.readCurrent
(specifically, to see how well it can parallelize). - Add a benchmark (
tpc
) that explicitly (and only) tests moving through the three phases of a successful transaction commit on a storage. - Make pre-loading objects with
--min-objects
faster by using pre-serialized object data. - Increase the default size of objects to 300 bytes, and make it the same on Python 2 and Python 3. This closely matches the measurement of the average object size in a large production database (30 million objects).
- Add a benchmark for allocating new OIDs. See issue #47.
- Add a benchmark for conflict resolution, designed to emphasize parallel commit. See issue #46.
- Add a benchmark focused just on storing new objects, eliminating the pickling and OID allocation from the timing. See issue #49.
- Enhance the transaction commit benchmarks to show the difference between implicit commit and explicit commit. ZODB makes extra storage calls in the implicit case.
- Add support for Python 3.8.
- Allow excluding particular benchmarks on the command line. For
example,
-cold
. - When benchmarking multiple ZODB configurations, run a particular benchmark for all databases before moving on to the next benchmark. Previously all benchmarks for a database were run before moving on to the next database. This makes it a bit easier to eyeball results as the process is running.
0.7.0 (2019-05-31)¶
- Drop support for Python 3.4.
- Add support for Python 3.7.
- The timing loops have been rewritten on top of pyperf. This produces much more reliable/stable, meaningful data with a richer set of statistics information captured, and the ability to do analysis and comparisons on data files captured after a run is complete. Some command line options have changed as a result of this, and the output no longer is in terms of “objects per second” but how long a particular loop operation takes. See issue #37 and issue #35.
- The timing data for in-process concurrency (gevent and threads) attempts to take the concurrency into account to produce more accurate results. See issue #38.
- Add debug logging when we think we detect gevent-cooperative and gevent-unaware databases in gevent mode.
- Add the ability to specify only certain subsets of benchmarks to run
on the command line. In particular, if you’ve already run the
add
benchmark once, you can run other benchmarks such as thecold
benchmark again independently as many times as you want (as long as you don’tzap
the database; that’s not allowed). - The benchmarks more carefully verify that they tested what they wanted to. For example, they check that their Connection’s load count matches what it should be (0 in the case of the “steamin” test).
- Profiling in gevent mode captures the entire set of data in a single file. See issue #33.
- The
--zap
option accepts aforce
argument to eliminate the prompts. See issue #36. - Multi-threaded runs handle exceptions and signals more reliably. Partial fix for issue #26.
- Shared thread read tests clear the caches of connections and the database in a more controlled way, more closely modeling the expected behaviour. Previously the cache clearing was non-deterministic. See issue #28.
- When using gevent, use its Event and Queue implementations for better cooperation with the event loop.
- Add
--min-objects
option to ensure that the underlying database has at least a set number of objects in place. This lets us test scaling issues and be more repeatable. This is tested with FileStorage, ZEO, and RelStorage (RelStorage 2.1a2 or later is needed for accurate results; earlier versions will add new objects each time, resulting in database growth). - Remove the unmaintained buildout configuration. See issue #25.
- Add an option to test the performance of blob storage. See issue #29.
- Add support for zapping file storages. See issue #43.
- When zapping, do so right before running the ‘add’ benchmark. This ensures that the databases are all the same size even when the same underlying storage (e.g., MySQL databas) is used multiple times in a configuration. Previously, the second and further uses of the same storage would not be zapped and so would grow with the data from the previous contender tests. See issue #42.
- Add a benchmark for empty transaction commits. This tests the storage synchronization — in RelStorage, it tests polling the RDBMS for invalidations. See issue #41.
- Add support for using vmprof to
profile, instead of
cProfile
. See issue #34.
0.6.0 (2016-12-13)¶
This is a major release that focuses on providing more options to fine tune the testing process that are expected to be useful to both deployers and storage authors.
A second major focus has been on producing more stable numeric results. As such, the results from this version are not directly comparable to results obtained from a previous version.
Platforms¶
- Add support for Python 3 (3.4, 3.5 and 3.6) and PyPy. Remove support for Python 2.6 and below.
- ZODB 4 and above are the officially supported versions. ZODB 3 is no longer tested but may still work.
Incompatible Changes¶
- Remove support for Python 2.6 and below.
- The old way of specifying concurrency levels with a comma separated list is no longer supported.
Command Line Tool¶
The help output and command parsing has been much improved.
- To specify multiple concurrency levels, specify the
-c
option multiple times. Similarly, to specify multiple object counts, specify the-n
option multiple times. (For example,-c 1 -c 2 -n 100 -n 200
would run four comparisons). The old way of separating numbers with commas is no longer supported. - Add the
--log
option to enable process logging. This is useful when using zodbshootout to understand changes in a single storage. - Add
--zap
to rebuild RelStorage schemas on startup. Useful when switching between Python 2 and Python 3. - The reported numbers should be more stable, thanks to running
individual tests more times (via the
--test-reps
option) and taking the mean instead of the min. - Add
--dump-json
to write a JSON representation of more detailed data than is present in the default CSV results.
Test Additions¶
- Add support for testing with BTrees (
--btrees
). This is especially helpful for comparing CPython and PyPy, and is also useful for understanding BTree behaviour. - Add support for testing using threads instead of multiprocessing
(
--threads
). This is especially helpful on PyPy or when testing concurrency of a RelStorage database driver and/or gevent. Databases may be shared or unique for each thread. - Add support for setting the repetition count (
--test-reps
). This is especially helpful on PyPy. - Use randomized data for the objects instead of a constant string. This lets us more accurately model effects due to compression at the storage or network layers.
- When gevent is installed, add support for testing with the system
monkey patched (
--gevent
). (Note: This might not be supported by all storages.) - Add
--leaks
to use objgraph to show any leaking objects at the end of each test repetition. Most useful to storage and ZODB developers.
Other¶
- Enable continuous integration testing on Travis-CI and coveralls.io.
- Properly clear ZEO caches on ZODB5. Thanks to Jim Fulton.
- Improve installation with pip. Extras are provided to make testing RelStorage as easy as testing FileStorage and ZEO.
- The documentation is now hosted at http://zodbshootout.readthedocs.io/
0.5 (2012-09-08)¶
- Updated to MySQL 5.1.65, PostgreSQL 9.1.5, memcached 1.4.15, and libmemcached 1.0.10.
- Moved development to github.
0.4 (2011-02-01)¶
- Added the –object-size parameter.
0.3 (2010-06-19)¶
- Updated to memcached 1.4.5, libmemcached 0.40, and pylibmc 1.1+.
- Updated to PostgreSQL 8.4.4.
- Updated to MySQL 5.1.47 and a new download url - the old was giving 401’s.
0.2 (2009-11-17)¶
- Buildout now depends on a released version of RelStorage.
0.1 (2009-11-17)¶
- Initial release.