Welcome to Pyzor’s documentation!

Latest PyPI version Number of PyPI downloads Build status

Contents:

Introduction

Pyzor is a collaborative, networked system to detect and block spam using digests of messages.

Using Pyzor client a short digest is generated that is likely to uniquely identify the email message. This digest is then sent to a Pyzor server to:

  • check the number of times it has been reported as spam or whitelisted as not-spam
  • report the message as spam
  • whitelist the message as not-spam

Since the entire system is released under the GPL, people are free to host their own independent servers. There is, however, a well-maintained and actively used public server available (courtesy of SpamExperts) at:

public.pyzor.org:24441

Getting the source

To clone the repository using git simply run:

git clone https://github.com/SpamExperts/pyzor

Please feel free to fork us and submit your pull requests.

Running tests

The pyzor tests are split into unittest and functional tests.

Unitests perform checks against the current source code and not the installed version of pyzor. To run all the unittests suite:

env PYTHONPATH=. python tests/unit/__init__.py

Functional tests perform checks against the installed version of pyzor and not the current source code. These are more extensive and generally take longer to run. They also might need special setup. To run the full suite of functional tests:

env PYTHONPATH=. python tests/functional/__init__.py

There is also a helper script available that sets-up the testing enviroment, also taking into consideration the python version you are currently using:

./scripts/run_tests

Note

The authentication details for the MySQL functional tests are taken from the test.conf file.

License

The project is licensed under the GNU GPLv2 license.

Getting Pyzor

Installing

The recommended and easiest way to install Pyzor is with pip:

pip install pyzor

In order to upgrade your Pyzor version run:

pip install --upgrade pyzor

Note

The latest version requires at least Python 2.6.6

The Pyzor code will also work on Python3, but requires refactoring done with the 2to3 tool. This has been integrated in the setup, so installation in Python3 now works seamlessly with any method described above.

You can also use Pyzor with PyPy.

Downloading

If you don’t want to or cannot use pip to download and install Pyzor. You can do so directly from the source:

python setup.py install

You can find the latest and older versions of Pyzor on PyPI.

Dependencies

Pyzor Client

If you plan on only using Pyzor to check message against our public server, then there are no required dependencies.

Pyzor Server

If you want to host your own Pyzor Server then you will need an appropriate back-end engine. Depending on what engine you want to use you will also need to install the required python dependecies. Please see Engines for more details.

The Pyzor also support the gevent library. If you want to use this feature then you will need to first install it.

Usage

Contents:

Pyzor Client

The Pyzor Client is a Python script deployed with the package. It provides a command line interface to the Pyzor Client API:

pyzor [options] command

You can also use the Python API directly to integrate Pyzor in your solution. For more information see pyzor.client.

Commands

Check

Checks the message read from stdin and prints the number of times it has been reported and the number of time it has been whitelisted. If multiple servers are listed in the configuration file each server is checked:

$ pyzor check < spam.eml
public.pyzor.org:24441      (200, 'OK')     134504  4681

The exit code will be:

  • 1 if the report count is 0 or the whitelist count is > 0
  • 0 if the report count is > 0 and the whitelist count is 0

Note that you can configure this behaviour by changing the report/whitelist thresholds from the configuration file or the command-line options. See client configuration.

Info

Prints detailed information about the message. The exit code will always be zero (0) if all servers returned (200, ‘OK’):

$ pyzor info < spam.eml
public.pyzor.org:24441      (200, 'OK')
    Count: 134538
    Entered: Sat Jan  4 10:01:34 2014
    Updated: Mon Mar 17 12:52:04 2014
    WL-Count: 4681
    WL-Entered: Mon Jan  6 14:32:01 2014
    WL-Updated: Fri Mar 14 16:11:02 2014
Report

Reports to the server a digest of each message as spam. Writes to standard output a tuple of (error-code, message) from the server. If multiple servers are listed in the configuration file the message is reported to each one:

$ pyzor report < spam.eml
public.pyzor.org:24441      (200, 'OK')
Whitelist

Reports to the server a digest of each message as not-spam. Writes to standard output a tuple of (error-code, message) from the server. If multiple servers are listed in the configuration file the message is reported to each one:

$ pyzor whitelist < spam.eml
public.pyzor.org:24441      (200, 'OK')

Note

This command is not available by default for the anonymous user.

Ping

Merely requests a response from the servers:

$ pyzor ping
public.pyzor.org:24441      (200, 'OK')
Pong

Can be used to test pyzor, this will always return a large number of reports and 0 whitelist, regardless of the message:

$ pyzor pong < ham.eml
public.pyzor.org:24441      (200, 'OK')     9223372036854775807     0
Predigest

Prints the message after the predigest phase of the pyzor algorithm.

Digest

Prints the message digest, that will be sent to the server.

Genkey

Based upon a secret passphrase gathered from the user and randomly gathered salt, prints to standard output a tuple of “salt,key”. Used to put account information into the accounts file.

Servers File

This file contains a list of servers that will be contacted by the Pyzor client for every operation. If no servers are specified it defaults to the public server:

public.pyzor.org:24441

The servers can also be specified as IP addresses, but they must always be followed by the port number.

For example having this in ~/.pyzor/servers:

# This is comment
public.pyzor.org:24441
127.0.0.1:24441

Will configure the client to check both the public server and a local one:

$ pyzor ping
public.pyzor.org:24441  (200, 'OK')
127.0.0.1:24441 (200, 'OK')

Input Style

Pyzor accepts messages in various forms. This can be controlled with the style configuration or command line option. Currently support are:

  • msg - individual RFC5321 message
  • mbox - mbox file of messages
  • digests - Pyzor digests, one per line

Pyzor Server

The Pyzor Server will listen on the specified address and any serve request from Pyzor Clients.

Daemon

Starting

The Pyzor Server can be started as a daemon by using the --detach option. This will:

  • daemonize the script and detach from tty
  • create a pid file
  • redirect any output to the specified file

Example:

$ pyzord --detach /dev/null --homedir=/home/user/.pyzor/
Stopping

To safely stop the Pyzor Server you can use the TERM signal to trigger a safe shutdown:

$ kill -TERM `cat /home/user/.pyzor/pyzord.pid`
Reloading

The reload signal will tell the Pyzor Server to reopen and read the access and passwd files. This is useful when adding new accounts or changing the permissions for an existing account. This is done by sending the USR1 signal to the process:

$ kill -USR1 `cat /home/user/.pyzor/pyzord.pid`

Engines

The Pyzor Server supports a number of back-end database engines to store the message digests.

Gdbm

This is the default engine, and the easiest to use and configure. But this it is also highly inefficient and not recommended for servers that see a large number of requests.

To use the the gdbm engine simply add to the config file ~/.pyzor/config:

[server]
Engine = gdbm
DigestDB = pyzord.db

The database file will be created if it didn’t previously exists, and will be located as usual in the specified Pyzor homedir.

For more information about GDBM see http://www.gnu.org.ua/software/gdbm/.

MySQL

This will require the MySQL-python library.

Note

MySQL-python does not currently support Python 3

To configure the MySQL engine you will need to:

  • Create a MySQL database (for e.g. pyzor)

  • Create a MySQL table with the following schema:

    CREATE TABLE `digests` (
        `digest` char(40) default NULL,
        `r_count` int(11) default NULL,
        `wl_count` int(11) default NULL,
        `r_entered` datetime default NULL,
        `wl_entered` datetime default NULL,
        `r_updated` datetime default NULL,
        `wl_updated` datetime default NULL,
        PRIMARY KEY  (`digest`)
    )
    
  • Create a MySQL user

  • Grant ALL PRIVILEGES to that user on the newly created table

To use the MySQL engine add to the configuration file:

[server]
Engine = mysql
DigestDB = localhost,user,password,pyzor,digests
Redis

This will require the redis library.

To use the redis engine simply add to the configuration file:

[server]
Engine = redis
DigestDB = localhost,6379,,0

Or if a password is required:

[server]
Engine = redis
DigestDB = localhost,6379,password,0

In the example above the redis database used is 0.

Migrating

If you want to migrate your database from one engine to another there is an utility script installed with pyzor designed to do this. Note that the arguments are the equivalent of the Engine and DigestDB options. Some usage examples:

  • Moving a database from gdbm to redis:

    pyzor-migrate --se gdbm --sd testdata/backup.db --de redis --dd localhost,6379,,0
    
  • Moving a database from redis to MySQL:

    pyzor-migrate --se redis --sd localhost,6379,,0 --de mysql --dd localhost,root,,pyzor,public
    

Access File

This file can be used to restrict or grant access to various server-side operations to accounts. For more information on setting up accounts see accounts.

The format is very similar to the popular tcp_wrappers hosts.{allow,deny}:

privilege ... : username ... : allow|deny
privilege:a list of whitespace-separated commands The keyword all can be used to to refer to all commands.
username:a list of whitespace-separated usernames. The keyword all can be used to refer to all users. The anonymous user is refereed to as anonymous.
allow|deny:whether or not the specified user(s) can perform the specified privilege(s) on the line.

The file is processed from top to bottom, with the first match for user/privilege being the value taken. Every file has the following implicit final rule:

all : all : deny

If this file is non-existant, the following default is used:

check report ping pong info : anonymous : allow

Accounts

Pyzor Accounts can be used to grant or restrict access to the Pyzor Server, by ensuring the client are authenticated.

To get an account on a server requires coordination between the client user and server admin. Use the following steps:

  1. User and admin should agree on a username for the user. Allowed characters for a username are alpha-numerics, the underscore, and dashes. The normative regular expression it must match is ^[-\.\w]+$. Let us assume they have agreed on bob.

  2. User generates a key with pyzor genkey. Let us say that it generates the salt,key of:

    227bfb58efaba7c582d9dcb66ab2063d38df2923,8da9f54058c34e383e997f45d6eb74837139f83b
    
  3. Assuming the server is at 127.0.0.1:9999, the user puts the following entry into ~/.pyzor/accounts:

    127.0.0.1 : 9999 : bob : 227bfb58efaba7c582d9dcb66ab2063d38df2923,8da9f54058c34e383e997f45d6eb74837139f83b
    

    This tells the Pyzor Client to use the bob account for server 127.0.0.1:9999. It will still use the anonymous user for all other servers.

  4. The user then sends the key (the part to the right-hand side of the comma) to the admin.

  5. The admin adds the key to their ~/.pyzor/pyzord.passwd:

    bob : 8da9f54058c34e383e997f45d6eb74837139f83b
    
  6. Assuming the admin wants to give the privilege of whitelisting (in addition to the normal permissions), the admin then adds the appropriate permissions to ~/.pyzor/pyzord.access:

    check report ping pong info whitelist : bob : allow
    

    For more information see Access File.

  7. To reload the account and access information send the USR1 signal to the daemon.

Procmail

To use Pyzor in a procmail system, consider using the following simple recipe:

 :0 Wc
| pyzor check :0 a
pyzor-caught

If you prefer, you can merely add a header to message marked with Pyzor, instead of immediately filtering them into a separate folder:

 :0 Wc
| pyzor check :0 Waf
| formail -A 'X-Pyzor: spam'

ReadyExec

ReadyExec is a system to eliminate the high startup-cost of executing scripts repeatedly. If you execute Pyzor a lot, you might be interested in installing ReadyExec and using it with Pyzor.

To use Pyzor with ReadyExec, the readyexecd.py server needs to be started as:

readyexecd.py socket_file pyzor.client.run

socket_file can be any (non-existing) filename you wish ReadyExec to use, such as /tmp/pyzor:

readyexecd.py /tmp/pyzor pyzor.client.run

Individual clients are then executed as:

readyexec socket_file options command cmd_options

For example:

readyexec /tmp/pyzor check
readyexec /tmp/pyzor report
readyexec /tmp/pyzor whitelist --style=mbox
readyexec /tmp/pyzor -d ping

Configuration

The format of this file is INI-style (name=value, divided into [sections]). Names are case insensitive. All values which are filenames can have shell-style tildes (~) in them. All values which are relative filenames are interpreted to be relative to the Pyzor homedir. All of these options can be overridden by command-line arguments.

It is recommended to use the provided sample configuration. Simply copy it in pyzor’s homedir, remove the .sample from the name and alter any configurations you prefer.

client configuration

ServersFile
Must contain a newline-separated list of server addresses to report/whitelist/check with. All of these server will be contacted for every operation. See Servers File.
AccountsFile
File containing information about accounts on servers. See Accounts.
LogFile
If this is empty then logging is done to stdout.
Timeout
This options specifies the number of seconds that the pyzor client should wait for a response from the server before timing out.
Style
Specify the message input style. See Input Style.
ReportThreshold
If the number of reports exceeds this threshold then the exit code of the pyzor client is 0.
WhitelistThreshold
If the number of whitelists exceed this threshold then exit code of the pyzor client is 1.

server configuration

Port
Port to listen on.
ListenAddress
Address to listen on.
LogFile
File to contain server logs.
UsageLogFile
File to contain server usage logs (information about each request).
PidFile
This file contain the pid of the pyzord daemon when used with the –detach option.
PasswdFile
File containing a list of user account information. See Accounts.
AccessFile
File containing information about user privileges. See Access File.
Gevent
If set to true uses the gevent library.
Engine
Then engine type to be used for storage. See Engines.
DigestDB
The database connection information. Format varies depending on the engine used. See Engines.
CleanupAge
The maximum age of a record before it gets removed (in seconds). To disable this set to 0.
Threads
If set to true, the pyzor server will use multi-threading to serve requests.
MaxThreads
The maximum number of concurrent threads (0 means unlimited).
DBConnections
The number of database connections kept opened by the server (0 means a new one for each request).

Note

DBConnections only applies to the MySQL engine.

Processes
If set to true, the pyzor server will use multi-processing to serve requests.
MaxProcesses
The maximum number of concurrent processes (cannot be unlimited).

Changelog

Pyzor 0.8.0

Bug fixes:

  • Fix unicode decoding issues. (#1)

New features:

  • A new option for the pyzor server to set-up digest forwarding.
  • A new script pyzor-migrate is now available. The script allows migrating your digest database from one engine to another. (#2)

Perfomance enhancements:

  • Use multiple threads when connecting to multiple servers in the pyzor client script. (#5)
  • A new BatchClient is available in pyzor client API. The client now send reports in batches to the pyzor server. (#13)

Others:

  • Small adjustments to the pyzor scripts to add Windows compatibility.
  • Automatically build documentation.
  • Continuous integration on Travis-CI.
  • Test coverage on coveralls.

Pyzor 0.7.0

Bug fixes:

  • Fix decoding bug when messages are badly formed
  • Pyzor now correctly creates the specified homedir, not the user’s one

New features:

  • Logging is now disabled by default
  • Automatically run 2to3 during installation (if required)

New pyzord features:

  • Added ability to disable expiry
  • New redis engine support has been added
  • New option to enable gevent
  • Added the ability to reload accounts and access files using USR1 signal
  • Added the ability to safely stop the daemon with TERM signal
  • Split the usage-log and normal log in two separate files
  • Pyzord daemon can now daemonize and detach itself

Pyzor 0.6.0

  • pyzor and pyzord will now work with Python3.3 (if the the 2to3-3.3 is previously ran)
  • pyzord and pyzor now supports IPv6
  • Improved handling of multi-threading (signals where again removed) for the mysql engine
  • Introduced multi-processing capabilities
  • Improved HTML parsing
  • Introduced self document sample configurations
  • Introduced ability to set custom report/whitelist thresholds for the pyzor client
  • Greatly improved tests coverage

Pyzor 0.5.0

Note that the majority of changes in this release were contributed back from the Debian pyzor package.

  • Man pages for pyzor and pyzord.
  • Changing back to signals for database locking, rather than threads. It is likely that signals will be removed again in the future, but the existing threading changes caused problems.
  • Basic checks on the results of “discover”.
  • Extended mbox support throughout the library.
  • Better handling on unknown encodings.
  • Added a –log option to log to a file.
  • Better handling of command-line options.
  • Improved error handling.

About

History

Pyzor initially started out to be merely a Python implementation of Razor, but due to the protocol and the fact that Razor’s server is not Open Source or software libre, Frank Tobin decided to implement Pyzor with a new protocol and release the entire system as Open Source and software libre.

Protocol

The central premise of Pyzor is that it converts an email message to a short digest that uniquely identifies the message. Simply hashing the entire message is an ineffective method of generating a digest, because message headers will differ when the content does not, and because spammers will often try to make a message unique by injecting random/unrelated text into their messages.

To generate a digest, the 2.0 version of the Pyzor protocol:

  • Discards all message headers.
  • If the message is greater than 4 lines in length:
  • Discards the first 20% of the message.
  • Uses the next 3 lines.
  • Discards the next 40% of the message.
  • Uses the next 3 lines.
  • Discards the remainder of the message.
  • Removes any ‘words’ (sequences of characters separated by whitespace) that are 10 or more characters long.
  • Removes anything that looks like an email address (X@Y).
  • Removes anything that looks like a URL.
  • Removes anything that looks like HTML tags.
  • Removes any whitespace.
  • Discards any lines that are fewer than 8 characters in length.

This is intended as an easy-to-understand explanation, rather than a technical one.

Reference

pyzor.engines

pyzor.engines.common

Common library shared by different engines.

class pyzor.engines.common.DBHandle

Bases: tuple

DBHandle(single_threaded, multi_threaded, multi_processing)

multi_processing

Alias for field number 2

multi_threaded

Alias for field number 1

single_threaded

Alias for field number 0

exception pyzor.engines.common.DatabaseError

Bases: exceptions.Exception

class pyzor.engines.common.Record(r_count=0, wl_count=0, r_entered=None, r_updated=None, wl_entered=None, wl_updated=None)

Bases: object

Prefix conventions used in this class: r = report (spam) wl = whitelist

r_increment()
r_update()
wl_increment()
wl_update()

pyzor.engines.gdbm

Gdbm database engine.

class pyzor.engines.gdbm_.GdbmDBHandle(fn, mode, max_age=None)

Bases: object

absolute_source = True
apply_method(method, varargs=(), kwargs=None)
classmethod decode_record(s)
static decode_record_0(s)
classmethod decode_record_1(s)
classmethod encode_record(value)
fields = ('r_count', 'r_entered', 'r_updated', 'wl_count', 'wl_entered', 'wl_updated')
items()
iteritems()
log = <logging.Logger object at 0x7f94ff8cbe10>
reorganize_period = 86400
start_reorganizing()
start_syncing()
sync_period = 60
this_version = '1'
class pyzor.engines.gdbm_.ThreadedGdbmDBHandle(fn, mode, max_age=None, bound=None)

Bases: pyzor.engines.gdbm_.GdbmDBHandle

Like GdbmDBHandle, but handles multi-threaded access.

apply_method(method, varargs=(), kwargs=None)

pyzor.engines.mysql

MySQLdb database engine.

class pyzor.engines.mysql.MySQLDBHandle(fn, mode, max_age=None)

Bases: object

absolute_source = False
items()
iteritems()
log = <logging.Logger object at 0x7f94ff8cbe10>
reconnect()
reconnect_period = 60
reorganize_period = 86400
start_reorganizing()
class pyzor.engines.mysql.ProcessMySQLDBHandle(fn, mode, max_age=None)

Bases: pyzor.engines.mysql.MySQLDBHandle

reconnect()
class pyzor.engines.mysql.ThreadedMySQLDBHandle(fn, mode, max_age=None, bound=None)

Bases: pyzor.engines.mysql.MySQLDBHandle

reconnect()

pyzor.engines.redis

Redis database engine.

class pyzor.engines.redis_.RedisDBHandle(fn, mode, max_age=None)

Bases: object

absolute_source = False
items()
iteritems()
log = <logging.Logger object at 0x7f94ff8cbe10>
class pyzor.engines.redis_.ThreadedRedisDBHandle(fn, mode, max_age=None, bound=None)

Bases: pyzor.engines.redis_.RedisDBHandle

pyzor.engines.redis_.decode_date(x)
pyzor.engines.redis_.encode_date(d)
pyzor.engines.redis_.safe_call(f)

Decorator that wraps a method for handling database operations.

Database backends for pyzord.

The database class must expose a dictionary-like interface, allowing access via __getitem__, __setitem__, and __delitem__. The key will be a forty character string, and the value should be an instance of the Record class.

If the database backend cannot store the Record objects natively, then it must transparently take care of translating to/from Record objects in __setitem__ and __getitem__.

The database class should take care of expiring old values at the appropriate interval.

pyzor.hacks

pyzor.hacks.py26

Hacks for Python 2.6

pyzor.hacks.py26.hack_all(email=True, select=True)

Apply all Python 2.6 patches.

pyzor.hacks.py26.hack_email()

The python2.6 version of email.message_from_string, doesn’t work with unicode strings. And in python3 it will only work with a decoded.

So switch to using only message_from_bytes.

pyzor.hacks.py26.hack_select()

The python2.6 version of SocketServer does not handle interrupt calls from signals. Patch the select call if necessary.

Various hack to make pyzor compatible with different Python versions.

pyzor.account

A collection of utilities that facilitate working with Pyzor accounts.

Note that accounts are not necessary (on the client or server), as an “anonymous” account always exists.

class pyzor.account.Account(username, salt, key)

Bases: object

pyzor.account.hash_key(key, user, hash_=<built-in function openssl_sha1>)

Returns the hash key for this username and password.

lower(H(U + ‘:’ + lower(K))) K is key (hex string) U is username H is the hash function (currently SHA1)

pyzor.account.key_from_hexstr(s)
pyzor.account.sign_msg(hashed_key, timestamp, msg, hash_=<built-in function openssl_sha1>)

Converts the key, timestamp (epoch seconds), and msg into a digest.

lower(H(H(M) + ‘:’ T + ‘:’ + K)) M is message T is integer epoch timestamp K is hashed_key H is the hash function (currently SHA1)

pyzor.account.verify_signature(msg, user_key)

Verify that the provided message is correctly signed.

The message must have “User”, “Time”, and “Sig” headers.

If the signature is valid, then the function returns normally. If the signature is not valid, then a pyzor.SignatureError() exception is raised.

pyzor.client

Networked spam-signature detection client.

>>> import pyzor
>>> import pyzor.client
>>> import pyzor.digest
>>> import pyzor.config

To load the accounts file:

>>> accounts = pyzor.config.load_accounts(filename)

To create a client (to then issue commands):

>>> client = pyzor.client.Client(accounts)

To create a client, using the anonymous user:

>>> client = pyzor.client.Client()

To get a digest (of an email.message.Message object, or similar):

>>> digest = pyzor.digest.get_digest(msg)

To query a server (where address is a (host, port) pair):

>>> client.ping(address)
>>> client.info(digest, address)
>>> client.report(digest, address)
>>> client.whitelist(digest, address)
>>> client.check(digest, address)

To query the default server (public.pyzor.org):

>>> client.ping()
>>> client.info(digest)
>>> client.report(digest)
>>> client.whitelist(digest)
>>> client.check(digest)

Response will contain, depending on the type of request, some of the following keys (e.g. client.ping()[‘Code’]):

All responses will have: - ‘Diag’ ‘OK’ or error message - ‘Code’ ‘200’ if OK - ‘PV’ Protocol Version - ‘Thread’

info and check responses will also contain: - ‘[WL-]Count’ Whitelist/Blacklist count

info responses will also have: - ‘[WL-]Entered’ timestamp when message was first whitelisted/blacklisted - ‘[WL-]Updated’ timestamp when message was last whitelisted/blacklisted

class pyzor.client.BatchClient(accounts=None, timeout=None, spec=None, batch_size=10)

Bases: pyzor.client.Client

Like the normal Client but with support for batching reports.

flush()

Deleting any saved digest reports.

force()

Force send any remaining reports.

report(digest, address=('public.pyzor.org', 24441))
whitelist(digest, address=('public.pyzor.org', 24441))
class pyzor.client.CheckClientRunner(routine, r_count=0, wl_count=0)

Bases: pyzor.client.ClientRunner

handle_response(response, message)
class pyzor.client.Client(accounts=None, timeout=None, spec=None)

Bases: object

check(digest, address=('public.pyzor.org', 24441))
info(digest, address=('public.pyzor.org', 24441))
max_packet_size = 8192
ping(address=('public.pyzor.org', 24441))
pong(digest, address=('public.pyzor.org', 24441))
read_response(sock, expected_id)
report(digest, address=('public.pyzor.org', 24441))
send(msg, address=('public.pyzor.org', 24441))
timeout = 5
whitelist(digest, address=('public.pyzor.org', 24441))
class pyzor.client.ClientRunner(routine)

Bases: object

handle_response(response, message)

mesaage is a string we’ve built up so far

run(server, args, kwargs=None)
class pyzor.client.InfoClientRunner(routine)

Bases: pyzor.client.ClientRunner

handle_response(response, message)

pyzor.config

Functions that handle parsing pyzor configuration files.

pyzor.config.expand_homefiles(homefiles, category, homedir, config)

Set the full file path for these configuration files.

pyzor.config.load_access_file(access_fn, accounts)

Load the ACL from the specified file, if it exists, and return an ACL dictionary, where each key is a username and each value is a set of allowed permissions (if the permission is not in the set, then it is not allowed).

‘accounts’ is a dictionary of accounts that exist on the server - only the keys are used, which must be the usernames (these are the users that are granted permission when the ‘all’ keyword is used, as described below).

Each line of the file should be in the following format:
operation : user : allow|deny

where ‘operation’ is a space-separated list of pyzor commands or the keyword ‘all’ (meaning all commands), ‘username’ is a space-separated list of usernames or the keyword ‘all’ (meaning all users) - the anonymous user is called “anonymous”, and “allow|deny” indicates whether or not the specified user(s) may execute the specified operations.

The file is processed from top to bottom, with the final match for user/operation being the value taken. Every file has the following implicit final rule:

all : all : deny
If the file does not exist, then the following default is used:
check report ping info : anonymous : allow
pyzor.config.load_accounts(filepath)

Layout of file is: host : port : username : salt,key

pyzor.config.load_passwd_file(passwd_fn)

Load the accounts from the specified file.

Each line of the file should be in the format:
username : key

If the file does not exist, then an empty dictionary is returned; otherwise, a dictionary of (username, key) items is returned.

pyzor.config.load_servers(filepath)

Load the servers file.

pyzor.config.setup_logging(log_name, filepath, debug)

Setup logging according to the specified options. Return the Logger object.

pyzor.digest

class pyzor.digest.DataDigester(msg, spec=None)

Bases: object

The major workhouse class.

atomic_num_lines = 4
digest
classmethod digest_payloads(msg)
email_ptrn = <_sre.SRE_Pattern object at 0x7f94ffaa80f8>
handle_atomic(lines)

We digest everything.

handle_line(line)
handle_pieced(lines, spec)

Digest stuff according to the spec.

longstr_ptrn = <_sre.SRE_Pattern object at 0x7f95038da880>
min_line_length = 8
classmethod normalize(s)
static normalize_html_part(s)
classmethod should_handle_line(s)
unwanted_txt_repl = ''
url_ptrn = <_sre.SRE_Pattern object at 0x7f95002c6ac0>
value
ws_ptrn = <_sre.SRE_Pattern object at 0x7f94ffaa3300>
class pyzor.digest.HTMLStripper(collector)

Bases: HTMLParser.HTMLParser

Strip all tags from the HTML.

handle_data(data)

Keep track of the data.

class pyzor.digest.PrintingDataDigester(msg, spec=None)

Bases: pyzor.digest.DataDigester

Extends DataDigester: prints out what we’re digesting.

handle_line(line)
pyzor.digest.get_digest(msg)

pyzor.forwarder

class pyzor.forwarder.Forwarder(forwarding_client, remote_servers, max_queue_size=10000)

Bases: object

Forwards digest to remote pyzor servers

queue_forward_request(digest, whitelist=False)

If forwarding is enabled, insert a digest into the forwarding queue if whitelist is True, the digest will be forwarded as whitelist request if the queue is full, the digest is dropped

start_forwarding()

start the forwarding thread

stop_forwarding()

disable forwarding and tell the forwarding thread to end itself

pyzor.message

This modules contains the various messages used in the pyzor client server communication.

class pyzor.message.CheckRequest(digest=None)

Bases: pyzor.message.SimpleDigestBasedRequest

op = 'check'
class pyzor.message.ClientSideRequest

Bases: pyzor.message.Request

op = None
setup()
class pyzor.message.InfoRequest(digest=None)

Bases: pyzor.message.SimpleDigestBasedRequest

op = 'info'
class pyzor.message.Message

Bases: email.message.Message

ensure_complete()
init_for_sending()
setup()
class pyzor.message.PingRequest

Bases: pyzor.message.ClientSideRequest

op = 'ping'
class pyzor.message.PongRequest(digest=None)

Bases: pyzor.message.SimpleDigestBasedRequest

op = 'pong'
class pyzor.message.ReportRequest(digest=None, spec=None)

Bases: pyzor.message.SimpleDigestSpecBasedRequest

op = 'report'
class pyzor.message.Request

Bases: pyzor.message.ThreadedMessage

This is the class that should be used to read in Requests of any type. Subclasses are responsible for setting ‘Op’ if they are generating a message,

ensure_complete()
get_op()
class pyzor.message.Response

Bases: pyzor.message.ThreadedMessage

ensure_complete()
get_code()
get_diag()
head_tuple()
is_ok()
ok_code = 200
class pyzor.message.SimpleDigestBasedRequest(digest=None)

Bases: pyzor.message.ClientSideRequest

add_digest(digest)
class pyzor.message.SimpleDigestSpecBasedRequest(digest=None, spec=None)

Bases: pyzor.message.SimpleDigestBasedRequest

class pyzor.message.ThreadId

Bases: int

error_value = 0
full_range = (0, 65536)
classmethod generate()
in_ok_range()
ok_range = (1024, 65536)
class pyzor.message.ThreadedMessage

Bases: pyzor.message.Message

ensure_complete()
get_protocol_version()
get_thread()
init_for_sending()
set_thread(i)
class pyzor.message.WhitelistRequest(digest=None, spec=None)

Bases: pyzor.message.SimpleDigestSpecBasedRequest

op = 'whitelist'

pyzor.server

Networked spam-signature detection server.

The server receives the request in the form of a RFC5321 message, and responds with another RFC5321 message. Neither of these messages has a body - all of the data is encapsulated in the headers.

The response headers will always include a “Code” header, which is a HTTP-style response code, and a “Diag” header, which is a human-readable message explaining the response code (typically this will be “OK”).

Both the request and response headers always include a “PV” header, which indicates the protocol version that is being used (in a major.minor format). Both the requestion and response headers also always include a “Thread”, which uniquely identifies the request (this is a requirement of using UDP). Responses to requests may arrive in any order, but the “Thread” header of a response will always match the “Thread” header of the appropriate request.

Authenticated requests must also have “User”, “Time” (timestamp), and “Sig” (signature) headers.

class pyzor.server.BoundedThreadingServer(address, database, passwd_fn, access_fn, max_threads, forwarding_server=None)

Bases: pyzor.server.ThreadingServer

Same as ThreadingServer but this also accepts a limited number of concurrent threads.

process_request(request, client_address)
process_request_thread(request, client_address)
class pyzor.server.ProcessServer(address, database, passwd_fn, access_fn, max_children=40, forwarding_server=None)

Bases: SocketServer.ForkingMixIn, pyzor.server.Server

A multi-processing version of the pyzord server. Each connection is served in a new process. This may not be suitable for all database types.

class pyzor.server.RequestHandler(*args, **kwargs)

Bases: SocketServer.DatagramRequestHandler

Handle a single pyzord request.

dispatches = {'info': <function handle_info at 0x7f94ff63b488>, 'whitelist': <function handle_whitelist at 0x7f94ff63b410>, 'ping': None, 'report': <function handle_report at 0x7f94ff63b398>, 'pong': <function handle_pong at 0x7f94ff63b2a8>, 'check': <function handle_check at 0x7f94ff63b320>}
handle()

Handle a pyzord operation, cleanly handling any errors.

handle_check(digest, record)

Handle the ‘check’ command.

This command returns the spam/ham counts for the specified digest.

handle_error(code, message)

Create an appropriate response for an error.

handle_info(digest, record)

Handle the ‘info’ command.

This command returns diagnostic data about a digest (timestamps for when the digest was first/last seen as spam/ham, and spam/ham counts).

handle_pong(digest, _)

Handle the ‘pong’ command.

This command returns maxint for report counts and 0 whitelist.

handle_report(digest, record)

Handle the ‘report’ command.

This command increases the spam count for the specified digest.

handle_whitelist(digest, record)

Handle the ‘whitelist’ command.

This command increases the ham count for the specified digest.

class pyzor.server.Server(address, database, passwd_fn, access_fn, forwarder=None)

Bases: SocketServer.UDPServer

The pyzord server. Handles incoming UDP connections in a single thread and single process.

load_config()

Reads the configuration files and loads the accounts and ACLs.

max_packet_size = 8192
reload_handler(*args, **kwargs)

Handler for the SIGUSR1 signal. This should be used to reload the configuration files.

shutdown_handler(*args, **kwargs)

Handler for the SIGTERM signal. This should be used to kill the daemon and ensure proper clean-up.

time_diff_allowance = 180
class pyzor.server.ThreadingServer(address, database, passwd_fn, access_fn, forwarder=None)

Bases: SocketServer.ThreadingMixIn, pyzor.server.Server

A threaded version of the pyzord server. Each connection is served in a new thread. This may not be suitable for all database types.

Networked spam-signature detection.

exception pyzor.AuthorizationError

Bases: pyzor.CommError

The signature was valid, but the user is not permitted to do the requested action.

exception pyzor.CommError

Bases: exceptions.Exception

Something in general went wrong with the transaction.

exception pyzor.IncompleteMessageError

Bases: pyzor.ProtocolError

A complete requested was not received.

exception pyzor.ProtocolError

Bases: pyzor.CommError

Something is wrong with talking the protocol.

exception pyzor.SignatureError

Bases: pyzor.CommError

Unknown user, signature on msg invalid, or not within allowed time range.

exception pyzor.TimeoutError

Bases: pyzor.CommError

The connection timed out.

exception pyzor.UnsupportedVersionError

Bases: pyzor.ProtocolError

Client is using an unsupported protocol version.