Welcome to MLPy’s documentation!

MLPy

https://img.shields.io/travis/evenmarbles/mlpy.svg https://img.shields.io/pypi/v/mlpy.svg

A Machine Learning library for Python

Features

  • TODO

Installation

At the command line:

$ easy_install mlpy

Or, if you have virtualenvwrapper installed:

$ mkvirtualenv mlpy
$ pip install mlpy

Usage

To use MLPy in a project:

import mlpy

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/evenmarbles/mlpy/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.

Write Documentation

MLPy could always use more documentation, whether as part of the official MLPy docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/evenmarbles/mlpy/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up mlpy for local development.

  1. Fork the mlpy repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/mlpy.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv mlpy
    $ cd mlpy/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 mlpy tests
    $ python setup.py test
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
  3. The pull request should work for Python 2.6, 2.7, 3.3, and 3.4, and for PyPy. Check https://travis-ci.org/evenmarbles/mlpy/pull_requests and make sure that the tests pass for all supported Python versions.

Tips

To run a subset of tests:

$ python -m unittest tests.test_mlpy

Credits

Development Lead

Contributors

None yet. Why not be the first?

History

0.1.0 (2015-08-11)

  • First release on PyPI.

Agent design (mlpy.agents)

This module contains functionality for designing agents navigating inside an Environment.

Control of the agents is specified by an agent module which is handled by the Agent base class.

An agent class deriving from Agent can also make use of a finite state machine (FSM) to control the agent’s behavior and a world model to maintain a notion of the current state of the world.

Agents

Agent The agent base class.
AgentModuleFactory The agent module factory.
IAgentModule Agent module base interface class.
LearningModule Learning agent module.
FollowPolicyModule The follow policy agent module.
UserModule The user agent module.

World Model

WorldObject The world object base class.
WorldModel The world model.

Finite State Machine

Event Transition event definition.
EmptyEvent A no-op transition event.
FSMState State base class.
Transition Transition class.
OnUpdate OnUpdate class.
StateMachine The finite state machine.

Auxiliary functions (mlpy.auxiliary)

This modules.

Array

accum An accumulation function similar to Matlab’s accumarray function.
normalize Normalize the input array to sum to 1.
nunique Efficiently count the unique elements of x along the given axis.

I/O

import_module_from_path Import a module from a file path and return the module object.
load_from_file Load data from file.
save_to_file Saves data to file.
is_pickle Check if the file with the given name is pickle encoded.
txt2pickle Converts a text file into a pickle encoded file.

Data structures

Array The managed array class.
Point2D The 2d-point class.
Point3D The 3d-point class.
Vector3D The 3d-vector class.
Queue The abstract queue base class.
FIFOQueue The first-in-first-out (FIFO) queue.
PriorityQueue The priority queue.

Data sets

DataSet The data set.

Miscellaneous

remove_key Safely remove the key from the dictionary.
listify Ensure that the object obj is of type list.
stdout_redirected Preventing a C shared library to print on stdout.

Plotting

Arrow3D Create a new Mock object.

Clustering package (mlpy.cluster)

K-means clustering

kmeans Hard cluster data using kmeans.

Constants (mlpy.constants)

Mathematical constants and units used in artificial intelligence.

Mathematical constants

epsilon 10^{-5}
infty 10^{308}

Units

SI prefixes

micro 10^{-6}

Environments (mlpy.environments)

Environment The environment base class.

Gridworld

Cell The abstract cell module.
GridWorld A gridworld consisting of a 2d-grid.

Nao

NaoEnvFactory The NAO environment factory.
PhysicalWorld The physical (real) environment.
Webots Simulated environment using the Webots simulator.

Experiment Infrastructure (mlpy.experiments)

Experiment The experiment class.

Tasks

Task The task description base class.
EpisodicTask The episodic task description base class.
SearchTask The abstract class for a search task definition.

Knowledge representations (mlpy.knowledgerep)

Case base reasoning (mlpy.knowledgerep.cbr)

Engine

CaseMatch Case match information.
Case The representation of a case in the case base.
CaseBaseEntry The case base entry class.
CaseBase The case base engine.

Features

FeatureFactory The feature factory.
Feature The abstract feature class.
BoolFeature The boolean feature.
StringFeature The string feature.
IntFeature The integer feature.
FloatFeature The float feature.

Similarity measures

Stat The similarity statistics container.
SimilarityFactory The similarity factory.
ISimilarity The similarity model interface.
NeighborSimilarity The neighborhood similarity model.
KMeansSimilarity The KMeans similarity model.
ExactMatchSimilarity The exact match similarity model.
CosineSimilarity The cosine similarity model.

Problem solving methods

CBRMethodFactory The case base reasoning factory.
ICBRMethod The method interface.
IReuseMethod The reuse method interface.
IRevisionMethod The revision method interface.
IRetentionMethod The retention method interface.
DefaultReuseMethod The default reuse method implementation.
DefaultRevisionMethod The default revision method implementation called from the case base.
DefaultRetentionMethod The default retention method implementation called from the case base.

Learning algorithms (mlpy.learners)

LearnerFactory The learner factory.
ILearner The learner interface.

Online learners (mlpy.learners.online)

IOnlineLearner The online learner base class.

Reinforcement learning

RLLearner The reinforcement learning learner interface.
QLearner Performs q-learning.
RLDTLearner Performs reinforcement learning using decision trees.

Offline learners (mlpy.learners.offline)

IOfflineLearner The offline learner base class.

Inverse reinforcement learning

ApprenticeshipLearner The apprenticeship learner.
IncrApprenticeshipLearner Incremental apprenticeship learner.

Markov decision process (MDP) (mlpy.mdp)

Transition and reward models

MDPModelFactory The Markov decision process (MDP) model factory.
IMDPModel The Markov decision process interface.

Discrete models

DiscreteModel The MDP model for discrete states and actions.
DecisionTreeModel The MDP model for discrete states and actions realized with decision trees.
Model explorer
ExplorerFactory The model explorer factory.
RMaxExplorer RMax based exploration base class.
LeastVisitedBonusExplorer Least visited bonus explorer, a RMax based exploration model.
UnknownBonusExplorer Unknown bonus explorer, a RMax based exploration model.

Contiguous models

casml Continuous Action and State Model Learner (CASML)

Probability distributions

ProbaCalcMethodFactory The probability calculation method factory.
IProbaCalcMethod The Probability calculation method interface.
DefaultProbaCalcMethod The default probability calculation method.
ProbabilityDistribution Probability Distribution.

State and action information

Experience Experience base class.
RewardFunction The reward function.
StateActionInfo The models interface.
StateData State information interface.
MDPPrimitive A Markov decision process primitive.
State Representation of the state.
Action Representation of an action.

Modules and design patterns (mlpy.modules)

This module contains various modules and design patterns.

Modules

UniqueModule Class ensuring each instance has a unique name.
Module Base module class from which most modules inherit from.

Patterns

Borg Class ensuring that all instances share the same state.
Observable The observable base class.
Listener The listener interface.

Meta classes

Singleton Metaclass ensuring only one instance of the class exists.
RegistryInterface Metaclass registering all subclasses derived from a given class.

Optimization tools (mlpy.optimize)

Algorithms

EM Expectation-Maximization module base class.

Utilities

is_converged Check if an objective function has converged.

Planning tools (mlpy.planners)

Explorers

ExplorerFactory The explorer factory.
IExplorer The explorer interface class.

Discrete explorers

DiscreteExplorer The discrete explorer base class.
EGreedyExplorer The \epsilon-greedy explorer.
SoftmaxExplorer The softmax explorer.

Planners

IPlanner The planner interface class.

Discrete planners

ValueIteration Planning through value Iteration.

Statistical functions (mlpy.stats)

Discrete distributions

mlpy.stats.nonuniform

mlpy.stats.nonuniform = <Mock name='mock.rv_discrete' id='140162343223888'>

Create a new Mock object. Mock takes several optional arguments that specify the behaviour of the Mock object:

  • spec: This can be either a list of strings or an existing object (a class or instance) that acts as the specification for the mock object. If you pass in an object then a list of strings is formed by calling dir on the object (excluding unsupported magic attributes and methods). Accessing any attribute not in this list will raise an AttributeError.

    If spec is an object (rather than a list of strings) then mock.__class__ returns the class of the spec object. This allows mocks to pass isinstance tests.

  • spec_set: A stricter variant of spec. If used, attempting to set or get an attribute on the mock that isn’t on the object passed as spec_set will raise an AttributeError.

  • side_effect: A function to be called whenever the Mock is called. See the side_effect attribute. Useful for raising exceptions or dynamically changing return values. The function is called with the same arguments as the mock, and unless it returns DEFAULT, the return value of this function is used as the return value.

    Alternatively side_effect can be an exception class or instance. In this case the exception will be raised when the mock is called.

    If side_effect is an iterable then each call to the mock will return the next value from the iterable. If any of the members of the iterable are exceptions they will be raised instead of returned.

  • return_value: The value returned when the mock is called. By default this is a new Mock (created on first access). See the return_value attribute.

  • wraps: Item for the mock object to wrap. If wraps is not None then calling the Mock will pass the call through to the wrapped object (returning the real result). Attribute access on the mock will return a Mock object that wraps the corresponding attribute of the wrapped object (so attempting to access an attribute that doesn’t exist will raise an AttributeError).

    If the mock has an explicit return_value set then calls are not passed to the wrapped object and the return_value is returned instead.

  • name: If the mock has a name then it will be used in the repr of the mock. This can be useful for debugging. The name is propagated to child mocks.

Mocks can also be called with arbitrary keyword arguments. These will be used to set attributes on the mock after it is created.

mlpy.stats.gibbs

mlpy.stats.gibbs = <Mock name='mock.rv_discrete' id='140162343223888'>

Create a new Mock object. Mock takes several optional arguments that specify the behaviour of the Mock object:

  • spec: This can be either a list of strings or an existing object (a class or instance) that acts as the specification for the mock object. If you pass in an object then a list of strings is formed by calling dir on the object (excluding unsupported magic attributes and methods). Accessing any attribute not in this list will raise an AttributeError.

    If spec is an object (rather than a list of strings) then mock.__class__ returns the class of the spec object. This allows mocks to pass isinstance tests.

  • spec_set: A stricter variant of spec. If used, attempting to set or get an attribute on the mock that isn’t on the object passed as spec_set will raise an AttributeError.

  • side_effect: A function to be called whenever the Mock is called. See the side_effect attribute. Useful for raising exceptions or dynamically changing return values. The function is called with the same arguments as the mock, and unless it returns DEFAULT, the return value of this function is used as the return value.

    Alternatively side_effect can be an exception class or instance. In this case the exception will be raised when the mock is called.

    If side_effect is an iterable then each call to the mock will return the next value from the iterable. If any of the members of the iterable are exceptions they will be raised instead of returned.

  • return_value: The value returned when the mock is called. By default this is a new Mock (created on first access). See the return_value attribute.

  • wraps: Item for the mock object to wrap. If wraps is not None then calling the Mock will pass the call through to the wrapped object (returning the real result). Attribute access on the mock will return a Mock object that wraps the corresponding attribute of the wrapped object (so attempting to access an attribute that doesn’t exist will raise an AttributeError).

    If the mock has an explicit return_value set then calls are not passed to the wrapped object and the return_value is returned instead.

  • name: If the mock has a name then it will be used in the repr of the mock. This can be useful for debugging. The name is propagated to child mocks.

Mocks can also be called with arbitrary keyword arguments. These will be used to set attributes on the mock after it is created.

nonuniform A non-uniform discrete random variable.
gibbs A Gibbs distribution discrete random variable.

Conditional distributions

mlpy.stats.conditional_normal

mlpy.stats.conditional_normal = <mlpy.stats._conditional.conditional_normal_gen object>

mlpy.stats.conditional_student

mlpy.stats.conditional_student = <mlpy.stats._conditional.conditional_student_gen object>

mlpy.stats.conditional_mix_normal

mlpy.stats.conditional_mix_normal = <mlpy.stats._conditional.conditional_mix_normal_gen object>
conditional_normal Conditional Normal random variable.
conditional_student Conditional Student random variable.
conditional_mix_normal Conditional Mix-Normal random variable.

Multivariate distributions

mlpy.stats.multivariate_normal

mlpy.stats.multivariate_normal = <mlpy.stats._multivariate.multivariate_normal_gen object>

mlpy.stats.multivariate_student

mlpy.stats.multivariate_student = <mlpy.stats._multivariate.multivariate_student_gen object>

mlpy.stats.invwishart

mlpy.stats.invwishart = <mlpy.stats._multivariate.invwishart_gen object>

mlpy.stats.normal_invwishart

mlpy.stats.normal_invwishart = <mlpy.stats._multivariate.normal_invwishart_gen object>
multivariate_normal Multivariate Normal random variable.
multivariate_student Multivariate Student random variable.
invwishart Inverse Wishart random variable.
normal_invwishart Normal-Inverse Wishart random variable.

Statistical Models

mlpy.stats.models.markov

mlpy.stats.models.markov = <mlpy.stats.models._basic.markov_gen object>
markov Markov model.

Mixture Models

MixtureModel Mixture model base class.
DiscreteMM Discrete mixture model class.
GMM Gaussian mixture model class.
StudentMM Student mixture model class.

Statistical functions

mlpy.stats.canonize_labels

mlpy.stats.canonize_labels(labels, support=None)[source]

Transform labels to 1:k.

The size of canonized is the same as ladles but every label is transformed to its corresponding 1:k. If labels does not span the support, specify the support explicitly as the 2nd argument.

Parameters:

labels : array_like

support : optional

Returns:

Transformed labels.

Examples

>>> canonize_labels()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

Warning

This is only a stub function. Implementation is still missing

mlpy.stats.is_posdef

mlpy.stats.is_posdef(a)[source]

Test if matrix a is positive definite.

The method uses Cholesky decomposition to determine if the matrix is positive definite.

Parameters:

a : ndarray

A matrix.

Returns:

bool :

Whether the matrix is positive definite.

Examples

>>> is_posdef()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.normalize_logspace

mlpy.stats.normalize_logspace(a)[source]

Normalizes the array a in the log domain.

Each row of a is a log discrete distribution. Returns the array normalized in the log domain while minimizing the possibility of numerical underflow.

Parameters:

a : ndarray

The array to normalize in the log domain.

Returns:

a : ndarray

The array normalized in the log domain.

lnorm : float

log normalization constant.

Examples

>>> normalize_logspace()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.partitioned_cov

mlpy.stats.partitioned_cov(x, y, c=None)[source]

Covariance of groups.

Partition the rows of x according to class labels in y and take the covariance of each group.

Parameters:

x : array_like, shape (n, dim)

The data to group, where n is the number of data points and dim is the dimensionality of each data point.

y : array_like, shape (n,)

The class label for each data point.

c : int

The number of components in y.

Returns:

cov : array_like

The covariance of each group.

Examples

>>> partitioned_cov()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

Warning

Implementation of this function is not finished yet.

mlpy.stats.partitioned_mean

mlpy.stats.partitioned_mean(x, y, c=None, return_counts=False)[source]

Mean of groups.

Groups the rows of x according to the class labels in y and takes the mean of each group.

Parameters:

x : array_like, shape (n, dim)

The data to group, where n is the number of data points and dim is the dimensionality of each data point.

y : array_like, shape (n,)

The class label for each data point.

return_counts : bool

Whether to return the number of elements in each group or not.

Returns:

mean : array_like

The mean of each group.

counts : int

The number of elements in each group.

Examples

>>> partitioned_mean()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.partitioned_sum

mlpy.stats.partitioned_sum(x, y, c=None)[source]

Sums of groups.

Groups the rows of x according to the class labels in y and sums each group.

Parameters:

x : array_like, shape (n, dim)

The data to group, where n is the number of data points and dim is the dimensionality of each data point.

y : array_like, shape (n,)

The class label for each data point.

c : int

The number of components in y.

Returns:

sums : array_like

The sum of each group.

Examples

>>> partitioned_sum()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.randpd

mlpy.stats.randpd(dim)[source]

Create a random positive definite matrix of size dim-by-dim.

Parameters:

dim : int

The dimension of the matrix to create.

Returns:

ndarray :

A dim-by-dim positive definite matrix.

Examples

>>> randpd()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.shrink_cov

mlpy.stats.shrink_cov(x, return_lambda=False, return_estimate=False)[source]

Covariance shrinkage estimation.

Ledoit-Wolf optimal shrinkage estimator for cov(X) C = \lambda*t + (1 - \lambda) * s using the diagonal variance ‘target’ t=np.diag(s) with the unbiased sample cov s as the unconstrained estimate.

Parameters:

x : array_like, shape (n, dim)

The data, where n is the number of data points and dim is the dimensionality of each data point.

return_lambda : bool

Whether to return lambda or not.

return_estimate : bool

Whether to return the unbiased estimate or not.

Returns:

C : array

The shrunk final estimate

lambda_ : float, optional

Lambda

estimate : array, optional

Unbiased estimate.

Examples

>>> shrink_cov()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.sq_distance

mlpy.stats.sq_distance(p, q, p_sos=None, q_sos=None)[source]

Efficiently compute squared Euclidean distances between stats of vectors.

Compute the squared Euclidean distances between every d-dimensional point in p to every d-dimensional point in q. Both p and q are n-point-by-n-dimensions.

Parameters:

p : array_like, shape (n, dim)

Array where n is the number of points and dim is the number of dimensions.

q : array_like, shape (n, dim)

Array where n is the number of points and dim is the number of dimensions.

p_sos : array_like, shape (dim,)

q_sos : array_like, shape (dim,)

Returns:

ndarray :

The squared Euclidean distance.

Examples

>>> sq_distance()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT

mlpy.stats.stacked_randpd

mlpy.stats.stacked_randpd(dim, k, p=0)[source]

Create stacked positive definite matrices.

Create multiple random positive definite matrices of size dim-by-dim and stack them.

Parameters:

dim : int

The dimension of each matrix.

k : int

The number of matrices.

p : int

The diagonal value of each matrix.

Returns:

ndarray :

Multiple stacked random positive definite matrices.

Examples

>>> stacked_randpd()

Note

Adapted from Matlab:

Copyright (2010) Kevin Murphy and Matt Dunham
License: MIT
is_posdef Test if matrix a is positive definite.
randpd Create a random positive definite matrix.
stacked_randpd Create multiple random positive definite matrices.
normalize_logspace Normalize in log space while avoiding numerical underflow.
sq_distance Efficiently compute squared Euclidean distances between stats of vectors.
partitioned_cov Partition the rows of x according to y and take the covariance of each group.
partitioned_mean Groups the rows of x according to the class labels in y and takes the mean of each group.
partitioned_sum Groups the rows of x according to the class labels in y and sums each group.
shrink_cov Ledoit-Wolf optimal shrinkage estimator.
canonize_labels Transform labels to 1:k.

Dynamic Bayesian networks (mlpy.stats.dbn)

hmm Hidden Markov Models
is_posdef Test if matrix a is positive definite.
randpd Create a random positive definite matrix of size dim-by-dim.
stacked_randpd Create stacked positive definite matrices.
normalize_logspace Normalizes the array a in the log domain.
sq_distance Efficiently compute squared Euclidean distances between stats of vectors.
partitioned_cov Covariance of groups.
partitioned_mean Mean of groups.
partitioned_sum Sums of groups.
shrink_cov Covariance shrinkage estimation.
canonize_labels Transform labels to 1:k.
nonuniform Create a new Mock object.
gibbs Create a new Mock object.
conditional_normal
conditional_student
conditional_mix_normal
multivariate_normal
multivariate_student
invwishart
normal_invwishart
models.markov

Tools (mlpy.tools)

ConfigMgr The configuration manager.
LoggingMgr The logging manager Singleton class.
Waiting The waiting class.

Indices and tables