Welcome to Palladium!

Palladium provides means to easily set up predictive analytics services as web services. It is a pluggable framework for developing real-world machine learning solutions. It provides generic implementations for things commonly needed in machine learning, such as dataset loading, model training with parameter search, a web service, and persistence capabilities, allowing you to concentrate on the core task of developing an accurate machine learning model. Having a well-tested core framework that is used for a number of different services can lead to a reduction of costs during development and maintenance due to harmonization of different services being based on the same code base and identical processes. Palladium has a web service overhead of a few milliseconds only, making it possible to set up services with low response times.

A configuration file lets you conveniently tie together existing components with components that you developed. As an example, if what you want to do is to develop a model where you load a dataset from a CSV file or an SQL database, and train an SVM classifier to predict one of the rows in the data given the others, and then find out about your model’s accuracy, then that’s what Palladium allows you to do without writing a single line of code. However, it is also possible to independently integrate own solutions.

Illustration of Palladium

Much of Palladium’s functionality is based on the scikit-learn library. Thus, a lot of times you will find yourself looking at the documentation for scikit-learn when developing with Palladium. Although being implemented in Python, Palladium provides support for other languages and is shipped with examples how to integrate and expose R and Julia models.

For an efficient deployment of services based on Palladium, a script to create Docker images automatically is provided. In order to manage and monitor a number of Palladium service instances in a cluster, Mesosphere’s Mesos framework Marathon can be used for deployment, also enabling scalability by having a variable number of service nodes behind a load balancer. Examples how to create Palladium Docker images and how to use them with Mesos / Marathon are part of the documentation. Other important aspects – especially relevant in enterprise contexts for setting up productive services – like authentication, logging, or monitoring, can be easily integrated via pluggable decorator lists in the configuration file of a service, keeping track of service calls and corresponding permissions.

User’s Guide

This part of the documentation is mostly prose. It starts with installation instructions for setting up Palladium for development, then develops a simple pipeline for predicting classes in the Iris flower dataset dataset. It will explain the concepts behind Palladium as it develops the application.

Installation

Palladium requires Python 3.5 or better to run. If you are currently using an older version of Python, you might want to check the FAQ entry about virtual environments. Some of Palladium’s dependencies such as numpy, scipy and scikit-learn may require a C compiler to install.

All of Palladium’s dependencies are listed in the requirements.txt file. You can use either pip or conda to install the dependencies from this file.

For most installations, it is recommended to install Palladium and its dependencies inside a virtualenv or a conda environment. The following commands assume that you have your environment active.

Install from PyPI

It is a good practice to install dependencies with exactly the same version numbers that the release was made with. You can find the requirements.txt that defines those version numbers in the top level directory of Palladium’s source tree or can download it here: requirements.txt. You can install the dependencies with the following command:

pip install -r requirements.txt

In order to install Palladium from PyPI, simply run:

pip install palladium

Install from binstar

For installing Palladium with conda install, you have to add the following binstar channel first:

conda config --add channels https://conda.binstar.org/ottogroup
conda install palladium

Note

Right now, there are only versions for linux-64 and osx-64 platforms available at our binstar channel.

Install from source

Download and navigate to your copy of the Palladium source, then run:

cd palladium
pip install -r requirements.txt

To install the Palladium package itself, run:

python setup.py install  # or 'setup.py dev' if you intend to develop Palladium itself

If you prefer conda over using pip, run these commands instead to install:

cd palladium
conda create -n palladium python=3 --file requirements.txt  #create conda environment
source activate palladium  # activate conda environment
python setup.py install

Note

The virtualenv or conda create and source activate commands above generate and activate an environment where specific Python package versions can be installed for a project without interferring with other Python projects. This environment has to be activated in each context you want to call Palladium scripts (e.g., in a shell). So if you run into problems finding the Palladium scripts or get errors regarding missing packages, it might be worth checking if you have activated the corresponding environment. If you want to deactivate an environment, simply run deactivate (or source deactivate for conda environments).

Note

If you intend to develop Palladium itself or if you want to run the tests, you additionally need to install the requirements-dev.txt with pip install -r requirements-dev.txt (or conda install --file requirements-dev.txt in the Anaconda setting).

Once you have Palladium installed, you should be able to use the pld-version command and find out which version of Palladium you’re using:

pld-version

Now that you’ve successfully installed Palladium, it’s time to head over to the Tutorial to learn about what it can do for you.

Tutorial

Run the Iris example

In this first part of the tutorial, we will run the simple Iris example that is included in the source distribution of Palladium. The Iris data set consists of a number of entries describing Iris flowers of three different types and is often used as an introductory example for machine learning.

It is assumed that you have already run through the Installation. You can either download the files needed for the tutorial here: config.py and iris.data. Alternatively, you can find the files in the source tree of Palladium. It should include the iris example in the examples/iris folder. Navigate to that folder and list its contents:

cd examples/iris
ls

You will notice that there are two files here. One is iris.data which is a CSV file with the dataset we want to train with. For each training example, iris.data defines four features and one of the three classes to predict.

The other file, config.py is our Palladium configuration file. It has all the configuration necessary to load the dataset CSV file and to train it with a random forest classifier.

All the following commands require you to set an environment variable to point to the config.py file. In general, when using any of Palladium’s scripts, you will want to have that environment variable set and pointing to your current project’s config.py. Using Bash, you could set the PALLADIUM_CONFIG environment variable so that it is picked up by subsequent calls to Palladium like so:

export PALLADIUM_CONFIG=config.py

Now we’re all set to fit our Iris model:

pld-fit

This command will print a number of lines and hopefully finish with the message Wrote model with version 1. If you list the contents of the directory you are in again, you will notice that there is a new file called iris-model.db. This is the SQLite database that Palladium created and saved our trained model in. We can now use this trained model and test it on a held-out test set:

pld-test

This will output an accuracy score, which should be something around 96 percent.

If you run pld-fit again, you’ll notice that it outputs Wrote model with version 2. The next call to pld-test will use that newer model to run tests. To test the first model that you trained, run:

pld-test --model-version=1

Let us try and use the web service that is included with Palladium to use our trained model to generate predictions. Run this command to bring up the web server:

pld-devserver

And now type this address into your browser’s address bar (assuming that you’re running the server locally):

The server should print out something like this:

{
    "result": "Iris-virginica",
    "metadata": {
        "service_name": "iris",
        "error_code": 0,
        "status": "OK",
        "service_version": "0.1"
    }
}

At this point we’ve already run through the palladium important scripts that Palladium provides.

Understand Iris’ config.py

In this section, we’ll take a closer look at the Iris example’s config.py file and how it wires together the components that we use to train and predict on the Iris dataset.

Open up the config.py file inside the examples/iris directory in Palladium’s source folder and let us now walk step-by-step through the entries of this file.

Note

Despite the .py file ending, config.py is not conventional Python source code. The file ending exists to help your editor to use Python syntax highlighting. But all that config.py consists of is a single Python dictionary.

Dataset loaders

The first configuration entry we’ll find inside config.py is something called dataset_loader_train. This is where we configure our dataset loader that helps us load the training data from the CSV file with the data, and define which rows should be used as data and target values. The first entry inside dataset_loader_train defines the type of dataset loader we want to use. That is palladium.dataset.Table:

'dataset_loader_train': {
    '__factory__': 'palladium.dataset.Table',

The rest what is inside the dataset_loader_train are the keyword arguments that are used to initialize the Table component. The full definition of dataset_loader_train looks like this:

'dataset_loader_train': {
    '__factory__': 'palladium.dataset.Table',
    'path': 'iris.data',
    'names': [
        'sepal length',
        'sepal width',
        'petal length',
        'petal width',
        'species',
        ],
    'target_column': 'species',
    'sep': ',',
    'nrows': 100,
    }

You can now take a look at Table‘s API to find out what parameters a Table accepts and what they mean. But to summarize: the path is the path to the CSV file. In our case, this is the relative path to iris.data. Because our CSV file doesn’t have the column names in the first line, we have to provide the column names using the names parameter. The target_column defines which of the columns should be used as the value to be predicted; this is the last column, which we named species. The nrows parameter tells Table to return only the first hundred samples from our CSV file.

If you take a look at the next section in the config file, which is dataset_loader_test, you will notice that it is very similar to the first one. In fact, the only difference between dataset_loader_train and dataset_loader_test is that the latter uses a different subset of the samples available in the same CSV file. So instead of using nrows, dataset_loader_test uses the skiprows parameter and thus skips the first hundred examples (in order to separate training and testing data):

'skiprows': 100,

Under the hood, Table uses pandas.io.parsers.read_table() to do the actual loading. Any additional named parameters passed to Table are passed on to read_table(). That is the case for the sep parameter in our example, but there are a lot of other useful options, too, like usecols, skiprows and so on.

Palladium also includes a dataset loader for loading data from an SQL database: palladium.dataset.SQL.

But if you find yourself in need to write your own dataset loader, then that is pretty easy to do: Take a look at Palladium’s DatasetLoader interface that documents how a DatasetLoader like Table needs to look like.

Model

The next section in our Iris configuration example is model. Here we define which machine learning algorithm we intend to use. In our case we’ll be using a logistic regression classifier out of scikit-learn:

'model': {
    '__factory__': 'sklearn.linear_model.LogisticRegression',
    'C': 0.3,
    },

Notice how we parametrize LogisticRegression with the regularization parameter C set to 0.3. To find out which other parameters exist for the LogisticRegression classifier, refer to the scikit-learn docs.

If you’ve written your own scikit-learn estimator before, you’ll already know how to write your own palladium.interfaces.Model class. You’ll want to implement fit() for model fitting, and predict() for prediction of target values. And possibly predict_proba() if you’re dealing with class probabilities.

If you need to do pre-processing of your data, say scaling, value imputation, feature selection, or the like, before you pass the data into the ML algorithm (such as the LogisticRegression classifier), you’ll want to take a look at scikit-learn pipelines. A Palladium model is not bound to be a simple estimator class; it can be a composite of several pre-processing steps or transformations, and the algorithm combined.

At this point, feel free to change the configuration file to maybe try out different values for C. Can you find a setting for C that produces better accuracy?

Model persister

Usually we’ll want the pld-fit command to save the trained model to disk.

The model_persister in the Iris config.py file is set up to save those models into a SQLite database. Let us take a look at that part of the configuration:

'model_persister': {
    '__factory__': 'palladium.persistence.CachedUpdatePersister',
    'update_cache_rrule': {'freq': 'HOURLY'},
    'impl': {
        '__factory__': 'palladium.persistence.Database',
        'url': 'sqlite:///iris-model.db',
        },
    },

The palladium.persistence.CachedUpdatePersister wraps the persister actually responsible for reading and writing models. It is possible to provide an update rule which specifies intervals to update the model. In the configuration above, the update_cache_rrule is set to an hourly update (in real applications, the frequency will palladium likely be much lower like daily or weekly). For details how to define these rules we refer to the python-dateutil docs. If no update_cache_rrule is provided, the model will not be updated automatically. The impl entry of this model persister specifies the actual persister to be wrapped.

The palladium.persistence.Database persister takes a single argument url which is the URL of the database to save the fitted model into. It will automatically create a table called models if such a table doesn’t exist yet. Please refer to the SQLAlchemy docs for details on which databases are supported, and how to form the database URL.

Palladium ships with another model persister called palladium.persistence.File that writes pickles to the file system. If you want to store your model anywhere else, or if you do not use Python’s pickle but something else, you might want to take a look at the ModelPersister interface, which describes the necessary methods. The location for storing the files can be chosen freely. However, the path has to contain a placeholder for adding the model’s version:

'model_persister': {
    '__factory__': 'palladium.persistence.CachedUpdatePersister',
    'impl': {
        '__factory__': 'palladium.persistence.File',
        'path': 'model-{version}.pickle',
        },
    },

If you prefer to use a REST backend like Artifactory for persisting your models, you can use the RestPersister:

'model_persister': {
    '__factory__': 'palladium.persistence.CachedUpdatePersister',
    'impl': {
        '__factory__': 'palladium.persistence.Rest',
        'url': 'http://localhost:8081/artifactory/modelz/{version}',
        'auth': ('username', 'passw0rd'),
        },
    },
Predict service

The next component in the Iris example configuration is called predict_service. The palladium.interfaces.PredictService is the workhorse behind what us happening in the /predict HTTP endpoint. Let us take a look at how it is configured:

'predict_service': {
    '__factory__': 'palladium.server.PredictService',
    'mapping': [
        ('sepal length', 'float'),
        ('sepal width', 'float'),
        ('petal length', 'float'),
        ('petal width', 'float'),
        ],
    }

Again, the specific implementation of the predict_service that we use is specified through the __factory__ setting.

The mapping defines which request parameters are to be expected. In this example, we expect a float number for each of sepal length, sepal width, petal length, petal width. Note that this is exactly the order in which the data was fed into the algorithm for model fitting.

An example request might then look like this (assuming that you’re running a server locally on port 5000):

The palladium.server.PredictService implementation that we use in this example has some more settings.

Its responsibility is also to create an HTTP response. In our example, if the prediction was successful (i.e., no errors whatsoever occurred), then the PredictService will generate a JSON response with an HTTP status code of 200:

{
    "result": "Iris-virginica",
    "metadata": {
        "service_name": "iris",
        "error_code": 0,
        "status": "OK",
        "service_version": "0.1"
    }
}

In case of a malformed request, you will see a status code of 400 and this response body:

{
    "metadata": {
        "service_name": "iris",
        "error_message": "BadRequest: ...",
        "error_code": -1,
        "status": "ERROR",
        "service_version": "0.1"
    }
}

If you want the predict service to work differently, then chances are that you get away subclassing from the PredictService class and override one of its methods. E.g. to change the way that API responses to the web look like, you would override the response_from_prediction() and response_from_exception() methods, which are responsible for creating the JSON responses.

Customizing entry points for predict services

Predict service specifications can be customized by setting the entry_point and decorator_list_name in the configuration. It is also possible to specify more than one predict service which can be reached by different entry points, e.g., if a model should be used in two different contexts with different parameters or response formats. This is an example how to specify two predict services with different entry points:

'predict_service1': {
    '__factory__': 'mypackage.server.PredictService',
    'mapping': [
        ('sepal length', 'float'),
        ('sepal width', 'float'),
        ('petal length', 'float'),
        ('petal width', 'float'),
        ],
    'entry_point': '/predict',
    'decorator_list_name': 'predict_decorators',
    }
'predict_service2': {
    '__factory__': 'mypackage.server.PredictServiceID',
    'mapping': [
        ('id', 'int'),
        ],
    'entry_point': '/predict-by-id',
    'decorator_list_name': 'predict_decorators_id',
    }

It is not necessary that the decorator list as specified by decorator_list_name is defined in the configuration; if it does not exist, no decorators will be used for this predict service.

Note

If entry_point and decorator_list_name are omitted, /predict and predict_decorators will be used as default values, leading to the same behavior as it was the case in earlier Palladium versions.

Implementing the model as a pipeline

As mentioned in the Model section, it is entirely possible to implement your own machine learning model and use it. Remember that the only interface our model needed to implement was palladium.interfaces.Model. That means we can also use a scikit-learn Pipeline to do the job. Let us extend our Iris example to use a pipeline with two elements: a sklearn.preprocessing.PolynomialFeatures transform and a sklearn.linear_model.LogisticRegression classifier. To do this, let us create a file called iris.py in the same folder as we have our config.py with the following contents:

from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

def model(**kwargs):
    pipeline = Pipeline([
        ('poly', PolynomialFeatures()),
        ('clf', LogisticRegression()),
        ])
    pipeline.set_params(**kwargs)
    return pipeline

The special **kwargs argument allows us to pass configuration options for both the poly and the clf elements of our pipeline in the configuration file. Let us try this: we change the model entry in config.py to look like this:

'model': {
    '__factory__': 'iris.model',
    'clf__C': 0.3,
    },

Just like in our previous example, we are setting the C hyper parameter of our LogisticRegression to be 0.3. However, this time, we have to prefix the parameter name by clf__ to tell the pipeline that we want to the set a parameter of the clf part of the pipeline. If you want to used grid search with this pipeline, keep in mind that you will also need to adapt the parameter’s name in the grid search section to clf_C.

It is also possible to use nested lists in configurations. With this feature, pipelines can be also defined purely through the configuration file, e.g.:

'model': {
    '__factory__': 'sklearn.pipeline.Pipeline',
    'steps': [['clf', {'__factory__': 'sklearn.linear_model.LinearRegression'}],
    ],
},

Deployment

In this section, you will find information on how to run Palladium-based application with another web server instead of Flask’s built in solution, and how to benchmark your service. Additionally, you will find information on how to use provided scripts to automatically generate Docker images of your service and how to deploy such Docker images using Mesos / Marathon.

Web server installation

Palladium uses HTTP to respond to prediction requests. Through use of the WSGI protocol (via Flask), Palladium can be used together with a variety of web servers.

For convenience, a web server is included for development purposes. To start the built-in web server, use the pld-devserver command.

For production use, you probably want to use something faster and more robust. Many options are listed in the Flask deployment docs. If you follow any of these instructions, be aware that the Flask app in Palladium is available as palladium.wsgi:app. So here’s how you would start an Palladium prediction server using gunicorn:

export PALLADIUM_CONFIG=/path/to/myconfig.py
gunicorn palladium.wsgi:app

An example configuration to use nginx to proxy requests to gunicorn is also available. It can be used without modification for our example and has to be made available in the /etc/nginx/sites-enabled/ folder and is active after a restart of nginx. For convenience it is reprinted here:

server {
    listen 80;

    server_name _;

    access_log  /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_pass         http://127.0.0.1:8000/;
        proxy_redirect     off;

        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
    }
}

Note

In previous versions (Palladium 1.0.1 and older), the Flask app was accessed via palladium.server:app. This was changed in order to initialize the configuration during start-up.

Benchmarking the service with Apache Benchmark

In order to benchmark the response time of a service, existing tools like Apache Benchmark (ab) or siege can be used. They can be installed using install packages, e.g., for Ubuntu with apt-get install apache2-utils and sudo apt-get install siege.

If a web server with the Iris predict service is running (either using the built-in pld-devserver or a faster solution as described in Web server installation), the ab benchmarking can be run as follows:

ab -n 1000 -c 10 "http://localhost:5000/predict?
                  sepal%20length=5.2&sepal%20width=3.5&
                  petal%20length=1.5&petal%20width=0.2"

In this ab call, it is assumed that the web server is available at port 5000 of localhost and 1000 requests with 10 concurrent requests at a time are sent to the web server. The output provides a number of statistics about response times of the calls performed.

Note

If there is an error in the sample request used, response times might be suspiciously low. If very low response times occur, it might be worth manually checking the corresponding response of used request URL.

Note

ab does not allow to use different URLs in a benchmark run. If different URLs are important for benchmarking, either siege or a multiple URL patch for ab could be used

Building a Docker image with your Palladium application

Building the Palladium base image

Here’s instructions on how to build the Palladium base image. This isn’t usually necessary, as you’ll probably want to just use the released base images for Palladium and add your application on top, see Building a Palladium app image.

A Dockerfile is available in the directory addons/docker/palladium_base_image for building a base image. You can download the file here: Dockerfile.

Run docker build in your terminal:

sudo docker build -t myname/palladium-base:1.1.0.1 .

A Docker image with the name myname/palladium-base:1.1.0.1 should now be created. You can check this with:

sudo docker images
Building a Palladium app image

Palladium has support for quickly building a Docker image to run your own application based on the Palladium base image. The Palladium base image can be pulled from Docker Hub as follows:

docker pull ottogroup/palladium-base

As an example, let’s build a Docker image for the Iris example that’s included in the source. We’ll use the Palladium base image for version 1.0, and we’ll name our own image my-palladium-app. Thus, we invoke pld-dockerize like so:

pld-dockerize palladium-src/examples/iris ottogroup/palladium-base:1.1.0.1 myname/my-palladium-app:1.0

This command will in fact create two images: one that’s called my-palladium-app, another one that’s called my-palladium-app-predict. The latter extends the former by adding calls to automatically fit your model and start a web server.

By default pld-dockerize will create the Dockerfile files and create the Docker containers. You may want to create the Dockerfile files only using the -d flag, and then modify files Dockerfile-app and Dockerfile-predict according to your needs.

Your application’s folder (examples/iris in this case) should look like this:

.
|--- config.py
|--- setup.py (optional)
|--- requirements.txt (optional)
'--- python_packages (optional)
     |--- package1.tar.gz
     |--- package2.tar.gz
     '--- ...

You may put additional requirements as shown into a python_packages subdirectory.

To test your image you can:

  1. Create app images using pld-dockerize as shown above.

  2. Run the “predict” image (e.g., my-palladium-app-predict if you used my-palladium-app to create the image), and map the Docker container’s port 8000 to a local port (e.g., 8001):

    sudo docker run -d -p 8001:8000 my-palladium-app-predict
    
  3. Your application should be up and running now. You should be able to access this URL: http://localhost:8001/alive

Setup Palladium with Mesos / Marathon and Docker

This section describes how to setup Mesos / Marathon with a containerized Palladium application. If you have not built a docker image with your Palladium application yet, you can follow the instructions that are provided in the Building a docker image with your Palladium application section.

For the installation of Mesos and Marathon you can follow the guide on Mesosphere. If you want to try it out locally first, we recommend to set up a single node Mesosphere cluster. Before adding a new application to Marathon you need to make sure that the Mesos slaves and Marathon are configured properly to work with Docker. To do so, follow the steps as described in the Marathon documentation.

An easy way to add a new application to Marathon is to use its REST API. For this task you need a json file which contains the relevant information for Marathon. A basic example of the json file could look like this:

{
    "id": "<app_name>",
    "container": {
        "docker": {
            "image": "<owner/palladium-app-name:version>",
            "network": "BRIDGE",
            "parameters": [
                {"key": "link", "value":"<some_container_to_link>"}
            ],
            "portMappings": [
                { "containerPort": 8000, "hostPort": 0, "servicePort": 9000,
                  "protocol": "tcp" }
            ]
        },
        "type": "DOCKER",
        "volumes": [
            {
                "containerPath": "/path/in/your/container",
                "hostPath": "/host/path",
                "mode": "RO"
            }
        ]
    },
    "cpus": 0.2,
    "mem": 256.0,
    "instances": 3,
    "healthChecks": [
        {
            "protocol": "HTTP",
            "portIndex": 0,
            "path": "/alive",
            "gracePeriodSeconds": 5,
            "intervalSeconds": 20,
            "maxConsecutiveFailures": 3
        }
    ],
    "upgradeStrategy": {
        "minimumHealthCapacity": 0.5
    }
}

You have to replace the Docker image name, port number (currently set to 8000) and - if there is any dependency - specify links to other containers. If you have a Docker image of the Iris service available (named user/palladium-iris-predict:0.1), you can use this file:

{
  "id": "palladium-iris",
    "container": {
        "docker": {
            "image": "user/palladium-iris-predict:0.1",
            "network": "BRIDGE",
            "parameters": [
            ],
            "portMappings": [
                { "containerPort": 8000, "hostPort": 0, "servicePort": 9000,
                  "protocol": "tcp" }
            ]
        },
        "type": "DOCKER",
        "volumes": [
         ]
    },
    "cpus": 0.2,
    "mem": 256.0,
    "instances": 3,
    "healthChecks": [
        {
            "protocol": "HTTP",
            "portIndex": 0,
            "path": "/alive",
            "gracePeriodSeconds": 5,
            "intervalSeconds": 20,
            "maxConsecutiveFailures": 3
        }
    ],
    "upgradeStrategy": {
        "minimumHealthCapacity": 0.5
    }
}

Now you can send the json application file to Marathon via POST (assuming Marathon is available at localhost:8080:

curl -X POST -H "Content-Type: application/json" localhost:8080/v2/apps
     -d @<path-to-json-file>

You can see the status of your Palladium service instances using the Marathon web user interface (available at http://localhost:8080 if you run the single node installation mentioned above) and can scale the number of instances as desired. Marathon keeps track of the Palladium instances. If a service instance breaks down, a new one will be started automatically.

Authorization

Sometimes you will want the Palladium web service’s entry points /predict and /alive to be secured by OAuth2 or similar. Defining predict_decorators and alive_decorators in the Palladium configuration file allows you to put any decorators in place to check authentication.

Let us first consider an example where you want to use HTTP Basic Auth to guard the entry points. Consider this code taken from the Flask snippets repository:

# file: mybasicauth.py

from functools import wraps
from flask import request, Response


def check_auth(username, password):
    """This function is called to check if a username /
    password combination is valid.
    """
    return username == 'admin' and password == 'secret'

def authenticate():
    """Sends a 401 response that enables basic auth"""
    return Response(
    'Could not verify your access level for that URL.\n'
    'You have to login with proper credentials', 401,
    {'WWW-Authenticate': 'Basic realm="Login Required"'})

def requires_auth(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        auth = request.authorization
        if not auth or not check_auth(auth.username, auth.password):
            return authenticate()
        return f(*args, **kwargs)
    return decorated

The requires_auth can now be used to decorate Flask views to guard them with basic authentication. Palladium allows us to add decorators to the /predict and /alive views that it defines itself. To do this, we only need to add this bit to the Palladium configuration file:

'predict_decorators': [
    'mybasicauth.requires_auth',
    ],

'alive_decorators': [
    'mybasicauth.requires_auth',
    ],

Of course, alternatively, you could set up your mod_wsgi server to take care of authentication.

Using Flask-OAuthlib to guard the two views using OAuth2 follows the same pattern. We will configure and use the flask_oauthlib.provider.OAuth2Provider for security. In our own package, we might have an instance of OAuth2Provider and a require_oauth decorator defined thus:

# file: myoauth.py

from flask_oauthlib.provider import OAuth2Provider
from palladium.server import app


oauth = OAuth2Provider(app)

# more setup code here... see Flask-OAuthlib

require_oauth = oauth.require_oauth('myrealm')

Alternatively, to get more decoupling from Palladium’s Flask app, you can use the following snippet inside your Palladium configuration and assign the Flask app to OAuth2Provider at application startup:

'oauth_init_app': {
    '__factory__': 'myoauth.oauth.init_app',
    'app': 'palladium.server.app',
    },

Now, to guard, /predict and /alive with the previously defined require_oauth, add this to your configuration:

'predict_decorators': [
    'myoauth.require_oauth'
    ],

'alive_decorators': [
    'myoauth.require_oauth'
    ],

Web service

Palladium includes an HTTP service that can be used to make predictions over the web using models that were trained with the framework. There are two endpoints: /predict, that makes predictions, and /alive which provides a simple health status.

Predict

The /predict service uses HTTP query parameters to accept input features, and outputs a JSON response. The number and types of parameters depend on the application. An example is provided as part of the Tutorial.

On success, /predict will always return an HTTP status of 200. An error is indicated by either status 400 or 500, depending on whether the error was caused by malformed user input, or by an error on the server.

The PredictService must be configured to define what parameters and types are expected. Here is an example configuration from the Tutorial:

'predict_service': {
    '__factory__': 'palladium.server.PredictService',
    'mapping': [
        ('sepal length', 'float'),
        ('sepal width', 'float'),
        ('petal length', 'float'),
        ('petal width', 'float'),
        ],
    },

An example request might then look like this (assuming that you’re running a server locally on port 5000):

The usual output for a successful prediction has both a result and a metadata entry. The metadata provides the service name and version as well as status information. An example:

{
    "result": "Iris-virginica",
    "metadata": {
        "service_name": "iris",
        "error_code": 0,
        "status": "OK",
        "service_version": "0.1"
        }
}

An example that failed contains a status set to ERROR, an error_code and an error_message. There is generally no result. Here is an example:

{
    "metadata": {
        "service_name": "iris",
        "error_message": "BadRequest: ...",
        "error_code": -1,
        "status": "ERROR",
        "service_version": "0.1"
        }
}

It’s also possible to send a POST request instead of GET and predict for a number of samples at the same time. Say you want to predict for the class for two Iris examples, then your POST body might look like this:

[
  {"sepal length": 6.3, "sepal width": 2.5, "petal length": 4.9, "petal width": 1.5},
  {"sepal length": 5.3, "sepal width": 1.5, "petal length": 3.9, "petal width": 0.5}
]

The response will generally look the same, with the exception that now there’s a list of predictions that’s returned:

{
    "result": ["Iris-virginica", "Iris-versicolor"],
    "metadata": {
        "service_name": "iris",
        "error_code": 0,
        "status": "OK",
        "service_version": "0.1"
        }
}

Should a different output format be desired than the one implemented by PredictService, it is possible to use a different class altogether by setting an appropriate __factory__ (though that class will likely derive from PredictService for reasons of convenience).

A list of decorators may be configured such that they will be called every time the /predict web service is called. To configure such a decorator, that will act exactly as if it were used as a normal Python decorator, use the predict_decorators list setting. Here is an example:

'predict_decorators': [
    'my_package.my_predict_decorator',
    ],

Alive

The /alive service implements a simple health check. It’ll provide information such as the palladium_version in use, the current memory_usage by the web server process, and all metadata that has been defined in the configuration under the service_metadata entry. Here is an example for the Iris service:

{
    "palladium_version": "0.6",
    "service_metadata": {
        "service_name": "iris",
        "service_version": "0.1"
    },
    "memory_usage": 78,
    "model": {
        "updated": "2015-02-18T10:13:50.024478",
        "metadata": {
              "version": 2,
              "train_timestamp": "2015-02-18T09:59:34.480063"
        }
    },
    "process_metadata": {}
}

/alive can optionally check for the presence of data loaded into the process’ cache (process_store). That is because some scenarios require the model and/or additional data to be loaded in memory before they can answer requests efficiently (cf. palladium.persistence.CachedUpdatePersister and palladium.dataset.ScheduledDatasetLoader).

Say you expect the process_store to be filled with a data entry (because maybe you’re using ScheduledDatasetLoader) before you’re able to answer requests. And you want /alive to return an error status (of 503) when that data hasn’t been loaded yet, then you’d add to your configuration the following entry:

'alive': {
    'process_store_required': ['data'],
    },

List

The /list handler returns model and model persister data. Here’s some example output:

{
    "models": [
        {"train_timestamp": "2018-04-09T13:08:11.933814", "version": 1},
        {"train_timestamp": "2018-04-09T13:11:05.336124", 'version': 2}
    ],
    "properties": {"active-model": "8", "db-version": "1.2"}
}

Fit, Update Model Cache, and Activate

Palladium allows for periodic updates of the model by use of the palladium.persistence.CachedUpdatePersister. For this to work, the web service’s model persister checks its model database source periodically for new versions of the model. Meanwhile, another process runs pld-fit and saves a new model into the same model database. When pld-fit is done, the web services will load the new model as part of the next periodic update.

The second option is to call the /fit web service endpoint, which will essentially run the equivalent of pld-fit, but in the web service’s process. This has a few drawbacks compared to the first method:

  • The fitting will run inside the same process as the web service. While the model is fitting, your web service will likely use considerably more memory and processing while the fitting is underway.
  • In multi-server or multi-process environments, you must take care of updating existing model caches (e.g. when running CachedUpdatePersister) by hand. This can be done by calling the /update-model-cache endpoint for each server process.

An example request to trigger a fit looks like this (assuming that you’re running a server locally on port 5000):

The request will return immediately, after spawning a thread to do the actual fitting work. The JSON response has the job’s ID, which we’ll later require next to check the status of our job:

{"job_id": "1adf9b2d-0160-45f3-a81b-4d8e4edf2713"}

The /alive endpoint returns information about all jobs inside of the service_metadata.jobs entry. After submitting above job, we’ll find that calling /alive returns something like this:

{
    "palladium_version": "0.6",
    // ...
    "process_metadata": {
        "jobs": {
            "1adf9b2d-0160-45f3-a81b-4d8e4edf2713": {
                "func": "<fit function>",
                "info": "<MyModel>",
                "started": "2018-04-09 09:44:52.660732",
                "status": "finished",
                "thread": 139693771835136
            }
        }
    }
}

The finished status indicates that the job was successfully completed. info contains a string representation of the function’s return value.

When using a cached persister, you may also want to run the /update-model-cache endpoint, which runs another job asynchronously, the same way that /fit does, that is, by returning an id and storing information about the job inside of process_metadata. /update-model-cache will update the cache of any caching model persisters, such as CachedUpdatePersister.

The /fit and /update-model-cache endpoints aren’t registered by default with the Flask app. To register the two endpoints, you can either call the Flask app’s add_url_rules directly or use the convenience function palladium.server.add_url_rule() instead inside of your configuration file. An example of registering the two endpoints is this:

'flask_add_url_rules': [
    {
        '__factory__': 'palladium.server.add_url_rule',
        'rule': '/fit',
        'view_func': 'palladium.server.fit',
        'methods': ['POST'],
    },
    {
        '__factory__': 'palladium.server.add_url_rule',
        'rule': '/update-model-cache',
        'view_func': 'palladium.server.update_model_cache',
        'methods': ['POST'],
    },
],

Another endpoint that’s not registered by default is /activate, which works just like its command line counterpart: it takes a model version and activates it in the model persister such that the next prediction will use the active model. The handler can be found at palladium.server.activate(). It requires a request parameter called model_version.

Scripts

Palladium includes a number of command-line scripts, many of which you may have already encountered in the Tutorial.

pld-test: test models

Test a model.

Uses 'dataset_loader_test' and 'model_persister' from the
configuration to load a test dataset to test the accuracy of a trained
model with.

Usage:
  pld-test [options]

Options:
  -h --help                  Show this screen.

  --model-version=<version>  The version of the model to be tested. If
                             not specified, the newest model will be used.

pld-list: list available models

List information about available models.

Uses the 'model_persister' from the configuration to display a list of
models and their metadata.

Usage:
  pld-list [options]

Options:
  -h --help                  Show this screen.

pld-version: display version number

Print the version number of Palladium.

Usage:
  pld-version [options]

Options:
  -h --help                Show this screen.

pld-upgrade: upgrade database

Upgrade the database to the latest version.

Usage:
  pld-ugprade [options]

Options:
  --from=<v>               Upgrade from a specific version, overriding
                           the version stored in the database.

  --to=<v>                 Upgrade to a specific version instead of the
                           latest version.

  -h --help                Show this screen.

See also

Upgrading

Upgrading the database

Changes between Palladium versions may require upgrading the model database or similar. These upgrades are handled automatically by the pld-upgrade script. Before running the script, make sure you’ve set the PALLADIUM_CONFIG environment variable, then simply run:

pld-upgrade

In some rare situations, it may be necessary to run the upgrade steps of specific versions only. pld-upgrade supports passing --from and --to options for that purpose. As an example, if you only want to run the upgrade steps between version 0.9 and 1.0, this is how you’d invoke pld-upgrade:

pld-upgrade --from=0.9.1 --to=1.0
Upgrading the Database persister from version 0.9.1 to 1.0

Users of palladium.persistence.Database that are upgrading from version 0.9.1 to a more recent version (e.g. 1.0) are required to invoke pld-upgrade with an explicit --from version like so:

pld-upgrade --from=0.9.1

Backward incompatibilities in code

The development team makes an effort to try and keep the API backward compatibility, and only gradually deprecate old code where necessary. However, some changes between major Palladium versions still introduce backward incompatibilities and potentially require you to update your Palladium plug-ins.

Breaking changes between 0.9.1. and 1.0

Backward incompatible changes between 0.9.1. and 1.0 are of concern only to users who have implemented their own version of palladium.server.PredictService.

palladium.server.PredictService.sample_from_request() has been replaced by the very similar sample_from_data(). The new method now accepts a data argument instead of request. data is the equivalent of the former request.args. Similarly, palladium.server.PredictService.params_from_request() has been replaced by params_from_data(). The latter now also accepts data instead of request, which again is the equivalent of the former request.data.

R support

Palladium has support for using DatasetLoader and Model objects that are programmed in the R programming language.

To use Palladium’s R support, you’ll have to install R and the Python rpy2 package.

An example is available in the examples/R folder in the source tree of Palladium (config.py, iris.R, iris.data). It contains an example of a very simple dataset loader and model implemented in R:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
packages_needed <- c("randomForest")
packages_missing <-
  packages_needed[!(packages_needed %in% installed.packages()[,"Package"])]
if(length(packages_missing))
  install.packages(packages_missing, repos='http://cran.uni-muenster.de')

library(randomForest)

dataset <- function() {
    x <- iris[,1:4]
    y <- as.factor(iris[,5])
    list(x, y)
}

train.randomForest <- function(x, y) {
    randomForest(x, as.factor(y))
}

When configuring a dataset loader that is programmed in R, use the palladium.R.DatasetLoader. An example:

'dataset_loader_train': {
    '__factory__': 'palladium.R.DatasetLoader',
    'scriptname': 'iris.R',
    'funcname': 'dataset',
    },

The scriptname points to the R script that contains the function dataset.

R models are configured very similarly, using palladium.R.ClassificationModel:

'model': {
    '__factory__': 'palladium.R.ClassificationModel',
    'scriptname': 'iris.R',
    'funcname': 'train.randomForest',
    'encode_labels': True,
    },

The configuration options are the same as for DatasetLoader except for the encode_labels option, which when set to True says that we would like to use a sklearn.preprocessing.LabelEncoder class to be able to deal with string target values. Thus ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] will be visible to the R model as [0, 1, 2].

It is okay to use a DatasetLoader that is programmed in Python together with an R model.

Julia support

Palladium has support for using Model objects that are implemented in the Julia programming language.

To use Palladium’s Julia support, you’ll have to install Julia 0.3 or better and the julia Python package. You’ll also need to install the PyCall library in Julia:

$ julia -e 'Pkg.add("PyCall"); Pkg.update()'

The following example also relies on the SVM Julia package. This is how you can install it:

$ julia -e 'Pkg.add("StatsBase"); Pkg.add("SVM"); Pkg.update()'

Warning

The latest PyCall version from GitHub is known to have significant performance issues. It is recommended that you install revision 120fb03 instead. To do this on Linux, change into your ~/.julia/v0.3/PyCall directory and issue the necessary git checkout command:

cd ~/.julia/v0.3/PyCall
git checkout 120fb03

Let’s now take a look at the example on how to use a model written in Julia in the examples/julia folder in the source tree of Palladium (config.py, iris.data). The configuration in that example defines the model to be of type palladium.julia.ClassificationModel:

'model': {
    '__factory__': 'palladium.julia.ClassificationModel',
    'fit_func': 'SVM.svm',
    'predict_func': 'SVM.predict',
    }

There’s two required arguments to ClassificationModel and they’re the dotted path to the Julia function used for fitting, and the equivalent for the Julia function that does the prediction. The complete description of available parameters is defined in the API docs:

class palladium.julia.AbstractModel(fit_func, predict_func, fit_kwargs=None, predict_kwargs=None, encode_labels=False)
Instantiates a model with the given *fit_func* and
*predict_func* written in Julia.

:param str fit_func:
  The dotted name of the Julia function to use for fitting.
  The function must take as its first two arguments the *X*
  and *y* arrays.  All elements of the optional *fit_kwargs*
  dictionary will be passed on to the Julia function as
  keyword arguments.  The return value of *fit_func* will be
  used as the first argument to *predict_func*.

:param str predict_func:
  Similar to *fit_func*, this is the dotted name of the Julia
  function used for prediction.  The first argument of this
  function is the return value of *fit_func*.  The second
  argument is the *X* data array.  All elements of the
  optional *fit_kwargs* dictionary will be passed on to the
  Julia function as keyword arguments.  The return value of
  *predict_func* is considered to be the target array *y*.

:param bool encode_labels:
  If set to *True*, the *y* target array will be automatically
  encoded using a :class:`sklearn.preprocessing.LabelEncoder`,
  which is useful if you have string labels but your Julia
  function only accepts numeric labels.

Advanced configuration

Configuration is an important part of every machine learning project. With Palladium, it is easy to separate code from configuration, and run code with different configurations.

Configuration files use Python syntax. For an introduction, please visit the Tutorial.

Palladium uses an environment variable called PALLADIUM_CONFIG to look up the location of the configuration file.

Variables

Configuration files have access to environment variables, which allows you to pass in things like database credentials from the environment:

'dataset_loader_train': {
    '__factory__': 'palladium.dataset.SQL',
    'url': 'mysql://{}:{}@localhost/test?encoding=utf8'.format(
        environ['DB_USER'], environ['DB_PASS'],
        ),
    'sql': 'SELECT ...',
    }

You also have access to here, which is the path to the directory that the configuration file lives in. In this example, we point the path variable to a file called data.csv inside of the same folder as the configuration:

'dataset_loader_train': {
    '__factory__': 'palladium.dataset.Table',
    'path': '{}/data.csv'.format(here),
    }

Multiple configuration files

In larger projects, it’s useful to split the configuration up into multiple files. Imagine you have a common config-data.py file and several config-model-X.py type files, each of which use the same data loader. When using multiple files, you must separate the filenames by commas: PALLADIUM_CONFIG=config-data.py,config-model-1.py.

If your configuration files share some entries (keys), then files coming later in the list will win and override entries from files earlier in the list. Thus, if the contents of config-data.py are {'a': 42, 'b': 6}” and the contents of config-model-1.py is {'b': 7, 'c': 99}, the resulting configuration will be {'a': 42, 'b': 7, 'c': 99}.

Avoiding duplication in your configuration

Even with multiple files, you’ll sometimes end up repeating portions of configuration between files. The __copy__ directive allows you to copy or override existing entries. Imagine your dataset loaders for train and test are identical, except for the location of the CSV file:

'dataset_loader_train': {
    '__factory__': 'palladium.dataset.Table',
    'path': '{}/train.csv'.format(here),
    'many': '...',
    'more': {'...'},
    'entries': ['...'],
    }

'dataset_loader_test': {
    '__factory__': 'palladium.dataset.Table',
    'path': '{}/test.csv'.format(here),
    'many': '...',
    'more': {'...'},
    'entries': ['...'],
    }

With __copy__, you can reduce this down to:

'dataset_loader_train': {
    '__factory__': 'palladium.dataset.Table',
    'path': '{}/train.csv'.format(here),
    'many': '...',
    'more': {'...'},
    'entries': ['...'],
    }

'dataset_loader_test': {
    '__copy__': 'dataset_loader_train',
    'path': '{}/test.csv'.format(here),
    }

Reducing duplication in your configuration can help avoid errors.

Frequently asked questions

How do I contribute to Palladium?

Everyone is welcome to contribute to Palladium. You can help us to improve Palladium when you:

  • Use Palladium and give us feedback or submit bug reports to GitHub.
  • Improve existing code or documentation and send us a pull request on GitHub.
  • Suggest a new feature, and possibly send a pull request for it.

In case you intend to improve or to add code to Palladium, we kindly ask you to:

  • Include documentation and tests for new code.
  • Ensure that all existing tests still run successfully.
  • Ensure backward compatibility in the general case.

How do I configure where output is logged to?

Some commands, such as pld-fit use Python’s own logging framework to print out useful information. Thus, we can configure where messages with which level are logged to. So maybe you don’t want to log to the console but to a file, or you don’t want to see debugging messages at all while using Palladium in production.

You can configure logging to suit your taste by adding a 'logging' entry to the configuration. The contents of this entry are expected to follow the logging configuration dictionary schema. An example for this dictionary-based logging configuration format is available here.

How can I combine Palladium with my logging or monitoring solution?

Similar to adding authentication support, we suggest to use the different pluggable decorator lists in order to send logging or monitoring messages to the corresponding systems. You need to implement decorators which wrap the different functions and then send information as needed to your logging or monitoring solution. Every time, one of the functions is called, the decorators in the decorator lists will also be called and can thus be used to generate logging messages as needed. Let us assume you have implemented the decorators my_app.log.predict, my_app.log.alive, my_app.log.fit, my_app.log.update_model, and my_app.log.load_data, you can add them to your application by adding the following parts to the configuration:

'predict_decorators': [
    'my_app.log.predict',
    ],

'alive_decorators': [
    'my_app.log.alive',
    ],

'update_model_decorators': [
    'my_app.log.update_model',
    ],

'fit_decorators': [
    'my_app.log.fit',
    ],

'load_data_decorators': [
    'my_app.log.load_data',
    ],

How can I use Python 3 without messing up with my Python 2 projects?

If you currently use an older version of Python or even need this older version for other projects, you should take a look at virtual environments.

If you use the default Python version, you could use virtualenv:

  1. Install Python 3 if not yet available
  2. pip install virtualenv
  3. mkdir <virtual_env_folder>
  4. cd <virtual_env_folder>
  5. virtualenv -p /usr/local/bin/python3 palladium
  6. source <virtual_env_folder>/palladium/bin/activate

If you use Anaconda, you can use the conda environments which can be created and activated as follows:

  1. conda create -n palladium python=3 anaconda
  2. source activate palladium

Note

Palladium’s installation documentation for Anaconda is already using a virtual environment including the requirements.txt.

After having successfully activated the virtual environment, this should be indicated by (palladium) in front of your shell command line. You can also check, if python --version points to the correct version. Now you can start installing Palladium.

Note

The environment has to be activated in each context you want to call Palladium scripts (e.g., in a shell). So if you run into problems finding the Palladium scripts or get errors regarding missing packages, it might be worth checking if you have activated the corresponding environment.

Where can I find information if there are problems installing numpy, scipy, or scikit-learn?

In the general case, the installation should work without problems if you are using Anaconda or have already installed these packages as provided with your operating system’s distribution. In case there are problems during installation, we refer to the installation instructions of these projects:

How can I use test Palladium components in a shell?

If you want to interactively check components of your Palladium configuration, you can access Palladium’s components as follows:

from palladium.util import initialize_config

config = initialize_config(__mode__='fit')
model = config['model']  # get model
X, y = config['dataset_loader_train']()  # load training data
# ...

You can also load the configuration to an interactive shell and access the components directly:

from code import InteractiveConsole
from pprint import pformat

from palladium.util import initialize_config

if __name__ == "__main__":
    config = initialize_config(__mode__='fit')
    banner = 'Palladium config:\n{}'.format(pformat(config))
    InteractiveConsole(config).interact(banner=banner)

In the interactive console, loading data and fitting a model can be done like this:

X, y = dataset_loader_train()
model.fit(X, y)

Note

Make sure, the PALLADIUM_CONFIG environment variable is pointing to a valid configuration file.

How can I access the active model in my code?

If you want to access the currently used model, you have to retrieve it via the process_store or you have to load it using the model persister:

from palladium.util import process_store
model = process_store.get('model')

from palladium.util import get_config
model = get_config()['model_persister'].read()

Note

get_config()['model'] might not return the current active model as the entries in the configuration are not updated after initialization.

API Reference

If you are looking for information on a specific function, class or method, this part of the documentation is for you.

palladium package

Submodules

palladium.R module

palladium.cache module

The cache module provides caching utilities in order to provide faster access to data which is needed repeatedly. The disk cache (diskcache) which is primarily used during development when loading data from the local harddisk is faster than querying a remote database.

class palladium.cache.abstractcache(compute_key=None, ignore=False)

Bases: object

An abstract class for providing basic functionality for caching function calls. It contains the handling of keys used for caching objects.

palladium.cache.compute_key_attrs(attrs)
class palladium.cache.diskcache(*args, filename_tmpl=None, **kwargs)

Bases: palladium.cache.abstractcache

The disk cache stores results of function calls as pickled files to disk. Usually used during development and evaluation to save costly DB interactions in repeated calls with the same data.

Note: Should changes to the database or to your functions require you to purge existing cached values, then those cache files are found in the location defined in filename_tmpl.

class palladium.cache.picklediskcache(*args, filename_tmpl=None, **kwargs)

Bases: palladium.cache.diskcache

Same as diskcache, except that standard pickle is used instead of joblib’s pickle functionality.

::

palladium.config module

class palladium.config.ComponentHandler(config)

Bases: object

class palladium.config.Config

Bases: dict

A dictionary that represents the app’s configuration.

Tries to send a more user friendly message in case of KeyError.

class palladium.config.CopyHandler(configs)

Bases: object

class palladium.config.PythonHandler(config)

Bases: object

palladium.config.get_config(**extra)
palladium.config.initialize_config(**extra)
palladium.config.process_config(*configs, handlers0=<function _handlers_phase0>, handlers1=<function _handlers_phase1>, handlers2=<function _handlers_phase2>)

palladium.dataset module

DatasetLoader implementations.

class palladium.dataset.EmptyDatasetLoader

Bases: palladium.interfaces.DatasetLoader

This DatasetLoader can be used if no actual data should be loaded. Returns a (None, None) tuple.

class palladium.dataset.SQL(url, sql, target_column=None, ndarray=True, **kwargs)

Bases: palladium.interfaces.DatasetLoader

A DatasetLoader that uses pandas.io.sql.read_sql() to load data from an SQL database. Supports all databases that SQLAlchemy has support for.

class palladium.dataset.ScheduledDatasetLoader(impl, update_cache_rrule)

Bases: palladium.interfaces.DatasetLoader

A DatasetLoader that loads periodically data into RAM to make it available to the prediction server inside the process_store.

ScheduledDatasetLoader wraps another DatasetLoader class that it uses to do the actual loading of the data.

An update_cache_rrule is used to define how often data should be loaded anew.

This class’ read() read method never calls the underlying dataset loader. It will only ever fetch the data from the in-memory cache.

cache = {'process_metadata': {}}
::
class palladium.dataset.Table(path, target_column=None, ndarray=True, **kwargs)

Bases: palladium.interfaces.DatasetLoader

A DatasetLoader that uses pandas.io.parsers.read_table() to load data from a file or URL.

palladium.eval module

Utilities for testing the performance of a trained model.

palladium.eval.list(model_persister)
palladium.eval.list_cmd(argv=['-T', '-b', 'readthedocssinglehtmllocalmedia', '-d', '_build/doctrees-readthedocssinglehtmllocalmedia', '-D', 'language=en', '.', '_build/localmedia'])

List information about available models.

Uses the ‘model_persister’ from the configuration to display a list of models and their metadata.

Usage:
pld-list [options]
Options:
-h –help Show this screen.
palladium.eval.test(dataset_loader_test, model_persister, scoring=None, model_version=None)
palladium.eval.test_cmd(argv=['-T', '-b', 'readthedocssinglehtmllocalmedia', '-d', '_build/doctrees-readthedocssinglehtmllocalmedia', '-D', 'language=en', '.', '_build/localmedia'])

Test a model.

Uses ‘dataset_loader_test’ and ‘model_persister’ from the configuration to load a test dataset to test the accuracy of a trained model with.

Usage:
pld-test [options]
Options:

-h –help Show this screen.

--model-version=<version>
 The version of the model to be tested. If not specified, the newest model will be used.

palladium.fit module

palladium.interfaces module

Interfaces defining the behaviour of Palladium’s components.

class palladium.interfaces.CrossValidationGenerator

Bases: object

A CrossValidationGenerator provides train/test indices to split data in train and validation sets.

CrossValidationGenerator corresponds to the cross validation generator interface of scikit-learn.

class palladium.interfaces.DatasetLoader

Bases: object

A DatasetLoader is responsible for loading datasets for use in training and evaluation.

class palladium.interfaces.DatasetLoaderMeta(name, bases, attrs, **kwargs)

Bases: abc.ABCMeta

class palladium.interfaces.Model

Bases: Dummy

A Model can be fit() to data and can be used to predict() data.

Model corresponds to the estimators interface of scikit-learn.

Fit to data array *X* and possibly a target array *y*.

:return: self
Predict classes for data array *X* with shape n x m.

Some models may accept additional keyword arguments.

:return:
  A numpy array of length n with the predicted classes (for
  classification problems) or numeric values (for regression
  problems).

:raises:
  May raise a :class:`PredictError` to indicate that some
  condition made it impossible to deliver a prediction.
Predict probabilities for data array *X* with shape n x m.

:return:
  A numpy array of length n x c with a list class
  probabilities per sample.

:raises:
  :class:`NotImplementedError` if not applicable.
class palladium.interfaces.ModelPersister

Bases: object

Set the model with the given *version* to be the active
one.

Implies that any previously active model becomes inactive.

:param str version:
  The *version* of the model that's activated.

:raises:
  LookupError if no model with given *version* exists.
Delete the model with the given *version* from the
database.

:param str version:
  The *version* of the model that's activated.

:raises:
  LookupError if no model with given *version* exists.
List metadata of all available models.

:return:
  A list of dicts, with each dict containing information about
  one of the available models.  Each dict is guaranteed to
  contain the ``version`` key, which is the same version
  number that :meth:`ModelPersister.read` accepts for loading
  specific models.
List properties of :class:`ModelPersister` itself.

:return:
  A dictionary of key and value pairs, where both keys and
  values are of type ``str``.  Properties will usually include
  ``active-model`` and ``db-version`` entries.
Returns a :class:`Model` instance.

:param str version:
  *version* may be used to read a specific version of a model.
  If *version* is ``None``, returns the active model.

:return:
  The model object.

:raises:
  LookupError if no model was available.
Upgrade the underlying database to the latest version.

Newer versions of Palladium may require changes to the
:class:`ModelPersister`'s database.  This method provides an
opportunity to run the necessary upgrade steps.

It's the :class:`ModelPersister`'s responsibility to keep
track of the Palladium version that was used to create and
upgrade its database, and thus to determine the upgrade steps
necessary.
Persists a :class:`Model` and returns a new version number.

It is the :class:`ModelPersister`'s responsibility to annotate
the 'version' information onto the model before it is saved.

The new model will initially be inactive.  Use
:meth:`ModelPersister.activate` to activate the model.

:return:
  The new model's version identifier.
class palladium.interfaces.ModelPersisterMeta(name, bases, attrs, **kwargs)

Bases: abc.ABCMeta

exception palladium.interfaces.PredictError(error_message, error_code=-1)

Bases: Exception

Raised by Model.predict() to indicate that some condition made it impossible to deliver a prediction.

class palladium.interfaces.PredictService

Bases: object

Responsible for producing the output for the ‘/predict’ HTTP endpoint.

palladium.interfaces.annotate(obj, metadata=None)

palladium.julia module

class palladium.julia.AbstractModel(fit_func, predict_func, fit_kwargs=None, predict_kwargs=None, encode_labels=False)

Bases: palladium.interfaces.Model

::
class palladium.julia.ClassificationModel(fit_func, predict_func, fit_kwargs=None, predict_kwargs=None, encode_labels=False)

Bases: palladium.julia.AbstractModel

palladium.julia.make_bridge()

palladium.persistence module

ModelPersister implementations.

class palladium.persistence.CachedUpdatePersister(impl, update_cache_rrule=None, check_version=True)

Bases: palladium.interfaces.ModelPersister

A ModelPersister that serves as a caching decorator around another ~palladium.interfaces.ModelPersister object.

Calls to read() will look up a model from the global process_store, i.e. there is never any actual loading involved when calling read().

To fill the process_store cache periodically using the return value of the underlying ModelPersister‘s read method, a dictionary containing keyword arguments to dateutil.rrule.rrule may be passed. The cache will then be filled periodically according to that recurrence rule.

If no update_cache_rrule is used, the CachedUpdatePersister will call once and remember the return value of the underlying ModelPersister‘s read method during initialization.

.. py:attribute:: CachedUpdatePersister.cache
module:palladium.persistence
annotation:= {‘process_metadata’: {}}
::
::
::
::
class palladium.persistence.Database(url, poolclass=None, chunk_size=104857600, table_postfix='')

Bases: palladium.interfaces.ModelPersister

A ModelPersister that pickles models into an SQL database.

::
::
::
::
::
upgrade_steps = <palladium.persistence.UpgradeSteps object>
class palladium.persistence.DatabaseCLOB(url, poolclass=None, chunk_size=104857600, table_postfix='')

Bases: palladium.persistence.Database

A ModelPersister derived from Database, with only the slight difference of using CLOB instead of BLOB to store the pickle data.

Use when BLOB is not available.

::
class palladium.persistence.File(path)

Bases: palladium.persistence.FileLike

A ModelPersister that pickles models onto the file system, into a given directory.

::
class palladium.persistence.FileIO

Bases: palladium.persistence.FileLikeIO

::
class palladium.persistence.FileLike(path, io)

Bases: palladium.interfaces.ModelPersister

A ModelPersister that pickles models through file-like handles.

An argument io is used to access low level file handle operations.

::
::
::
upgrade_steps = <palladium.persistence.UpgradeSteps object>
class palladium.persistence.FileLikeIO

Bases: object

Used by FileLike to access low level file handle operations.

Test whether a path exists

For normal files, the implementation is:

```python
return os.path.exists(path)
```
Return a file handle

For normal files, the implementation is:

```python
return open(path, mode)
```
Remove a file

For normal files, the implementation is:

```python
os.remove(path)
```
class palladium.persistence.Rest(url, auth)

Bases: palladium.persistence.FileLike

::
class palladium.persistence.RestIO(auth)

Bases: palladium.persistence.FileLikeIO

::
class palladium.persistence.UpgradeSteps

Bases: object

::

palladium.server module

palladium.util module

Assorted utilties.

palladium.util.Partial(func, **kwargs)

Allows the use of partially applied functions in the configuration.

class palladium.util.PluggableDecorator(decorator_config_name)

Bases: object

class palladium.util.ProcessStore(*args, **kwargs)

Bases: collections.UserDict

class palladium.util.RruleThread(func, rrule, sleep_between_checks=60)

Bases: threading.Thread

Calls a given function in intervals defined by given recurrence rules (from datetuil.rrule).

palladium.util.apply_kwargs(func, **kwargs)

Call func with kwargs, but only those kwargs that it accepts.

palladium.util.args_from_config(func)

Decorator that injects parameters from the configuration.

palladium.util.get_metadata(error_code=0, error_message=None, status='OK')
palladium.util.memory_usage_psutil()

Return the current process memory usage in MB.

palladium.util.resolve_dotted_name(dotted_name)
palladium.util.run_job(func, **params)
palladium.util.session_scope(session)

Provide a transactional scope around a series of operations.

palladium.util.timer(log=None, message=None)
palladium.util.upgrade(model_persister, from_version=None, to_version=None)
palladium.util.upgrade_cmd(argv=['-T', '-b', 'readthedocssinglehtmllocalmedia', '-d', '_build/doctrees-readthedocssinglehtmllocalmedia', '-D', 'language=en', '.', '_build/localmedia'])

Upgrade the database to the latest version.

Usage:
pld-ugprade [options]
Options:
--from=<v> Upgrade from a specific version, overriding the version stored in the database.
--to=<v> Upgrade to a specific version instead of the latest version.

-h –help Show this screen.

palladium.util.version_cmd(argv=['-T', '-b', 'readthedocssinglehtmllocalmedia', '-d', '_build/doctrees-readthedocssinglehtmllocalmedia', '-D', 'language=en', '.', '_build/localmedia'])

Print the version number of Palladium.

Usage:
pld-version [options]
Options:
-h –help Show this screen.

palladium.wsgi module

Module contents

Indices and tables