Ckan API client documentation

Contents:

Clients

There are currently three clients for the Ckan API, each one providing a different level of abstraction, and thus can be user for different needs:

  • CkanLowlevelClient – just a wrapper around the API.
  • High-level client: provides more abstraction around the CRUD methods.
  • Syncing client: provides facilities for “syncing” a collection of objects into Ckan.

Low-level

This is the client providing the lowest level of abstraction.

class ckan_api_client.low_level.CkanLowlevelClient(base_url, api_key=None)[source]

Ckan low-level client.

  • Handles authentication and response validation
  • Handles request body serialization and response body deserialization
  • Raises HTTPError exceptions on failed HTTP requests
  • Performs some checks on return values from the API
anonymous

Property, returning a copy of this client, without an api_key set

request(method, path, **kwargs)[source]

Wrapper around requests.request().

Extra functionality provided:

  • Add Authorization header to requests
  • If data is an object, serialize it with json and add the Content-type: application/json header.
  • If the response didn’t contain an “ok” code, raises a HTTPError exception.
Parameters:
  • method – HTTP method to be used
  • path – Path, relative to the Ckan root. For example: /api/3/action/package_list
  • headers – HTTP headers to be added to the request
  • data – Data to be sent in the request body
  • kwargs – Extra keyword arguments will be passed directly to the requests.request() call.
Raises:

ckan_api_client.exceptions.HTTPError – in case the HTTP request returned a non-ok status code

Returns:

a requests response object

list_datasets()[source]

Return a list of all dataset ids

iter_datasets()[source]

Generator yielding dataset objects, iterating over the whole database.

get_dataset(dataset_id)[source]

Get a dataset, using API v2

Parameters:dataset_id – ID of the requested dataset
Returns:a dict containing the data as returned from the API
Return type:dict
post_dataset(dataset)[source]

POST a dataset, using API v2 (usually for creation)

Parameters:dataset (dict) – a dict containing data to be sent to Ckan. Should not already contain an id
Returns:a dict containing the data as returned from the API
Return type:dict
put_dataset(dataset)[source]

PUT a dataset, using API v2 (usually for update)

Parameters:dataset (dict) – a dict containing data to be sent to Ckan. Must contain an id, that will be used to build the URL
Returns:a dict containing the updated dataset as returned from the API
Return type:dict
delete_dataset(dataset_id, ignore_404=True)[source]

DELETE a dataset, using API v2

Parameters:
  • dataset_id – if of the dataset to be deleted
  • ignore_404 (bool) – if True (the default), will simply ignore http 404 errors from the API
list_groups()[source]
iter_groups()[source]
get_group(group_id)[source]
post_group(group)[source]
put_group(group)[source]
delete_group(group_id, ignore_404=True)[source]
list_organizations()[source]
iter_organizations()[source]
get_organization(id)[source]
post_organization(organization)[source]
put_organization(organization)[source]

Warning! with api v3 we need to use POST!

delete_organization(id, ignore_404=True)[source]
list_licenses()[source]

High-level

class ckan_api_client.high_level.CkanHighlevelClient(base_url, api_key=None, fail_on_inconsistency=False)[source]

High-level client, handling CRUD of objects.

This class only returns / handles CkanObjects, to make sure we are handling consistent data (they have validators in place)

Parameters:
  • base_url – Base URL for the Ckan instance.
  • api_key – API key to be used when accessing Ckan. This is required for writing.
  • fail_on_inconsistency – Whether to fail on “inconsistencies” (mismatching updated objects). This is especially useful during development, in order to catch many problems with the client itself (or new bugs in Ckan..).
list_datasets()[source]
Returns:a list of dataset ids
iter_datasets()[source]

Generator, iterating over all the datasets in ckan

get_dataset(id, allow_deleted=False)[source]

Get a specific dataset, by id

Note

Since the Ckan API use both ids and names as keys, both get_dataset() and get_dataset_by_name() will perform the exact same request in the background.

The difference is only in the high-level handling: the function will check whether the expected id has the correct value, and raise an HTTPError(404, ..) otherwise..

Parameters:
  • id (str) – the dataset id
  • allow_deleted – Whether to return even logically deleted objects. If set to False (the default) will raise a HTTPError(404, ..) if state != 'active'
Return type:

CkanDataset

get_dataset_by_name(name, allow_deleted=False)[source]

Get a specific dataset, by name

Note

See note on get_dataset()

Parameters:
  • name (str) – the dataset name
  • allow_deleted – Whether to return even logically deleted objects. If set to False (the default) will raise a HTTPError(404, ..) if state != 'active'
Return type:

CkanDataset

save_dataset(dataset)[source]

If the dataset already has an id, call update_dataset(), otherwise, call create_dataset().

Returns:as returned by the called function.
Return type:CkanDataset
create_dataset(dataset)[source]

Create a dataset

Return type:CkanDataset
update_dataset(dataset)[source]

Update a dataset

Return type:CkanDataset
delete_dataset(id)[source]

Delete a dataset, by id

wipe_dataset(id)[source]

Actually delete a dataset, by renaming it first

list_organizations()[source]
list_organization_names()[source]
iter_organizations()[source]
get_organization(id, allow_deleted=False)[source]

Get organization, by id.

Note

See note on get_dataset()

Parameters:
  • id (str) – the organization id
  • allow_deleted – Whether to return even logically deleted objects. If set to False (the default) will raise a HTTPError(404, ..) if state != 'active'
Return type:

CkanOrganization

get_organization_by_name(name, allow_deleted=False)[source]

Get organization by name.

Note

See note on get_dataset()

Parameters:
  • name (str) – the organization name
  • allow_deleted – Whether to return even logically deleted objects. If set to False (the default) will raise a HTTPError(404, ..) if state != 'active'
Return type:

CkanOrganization

save_organization(organization)[source]
create_organization(organization)[source]

Create an organization

Return type:CkanOrganization
update_organization(organization)[source]
Return type:CkanOrganization
delete_organization(id)[source]
list_groups()[source]
list_group_names()[source]
iter_groups()[source]
get_group(id, allow_deleted=False)[source]

Get group, by id.

Note

See note on get_dataset()

Parameters:
  • id (str) – the group id
  • allow_deleted – Whether to return even logically deleted objects. If set to False (the default) will raise a HTTPError(404, ..) if state != 'active'
Return type:

CkanGroup

get_group_by_name(name, allow_deleted=False)[source]

Get group by name.

Note

See note on get_dataset()

Parameters:
  • name (str) – the group name
  • allow_deleted – Whether to return even logically deleted objects. If set to False (the default) will raise a HTTPError(404, ..) if state != 'active'
Return type:

CkanGroup

save_group(group)[source]
create_group(group)[source]
Return type:CkanGroup
update_group(group)[source]
Return type:CkanGroup
delete_group(id)[source]

Synchronization

class ckan_api_client.syncing.SynchronizationClient(base_url, api_key=None, **kw)[source]

Synchronization client, providing functionality for importing collections of datasets into a Ckan instance.

Synchronization acts as follows:

  • Snsure all the required organizations/groups are there; create a map between “source” ids and Ckan ids. Optionally update existing organizations/groups with new details.
  • Find all the Ckan datasets matching the source_name
  • Determine which datasets...
    • ...need to be created
    • ...need to be updated
    • ...need to be deleted
  • First, delete datasets to be deleted in order to free up names
  • Then, create datasets that need to be created
  • Lastly, update datasets using the configured merge strategy (see constructor arguments).
__init__(base_url, api_key=None, **kw)[source]
Parameters:
  • base_url – Base URL of the Ckan instance, passed to high-level client
  • api_key – API key to be used, passed to high-level client
  • organization_merge_strategy

    One of:

    • ‘create’ (default) if the organization doesn’t exist, create it. Otherwise, leave it alone.
    • ‘update’ if the organization doesn’t exist, create it. Otherwise, update with new values.
  • group_merge_strategy

    One of:

    • ‘create’ (default) if the group doesn’t exist, create it. Otherwise, leave it alone.
    • ‘update’ if the group doesn’t exist, create it. Otherwise, update with new values.
  • dataset_preserve_names – if True (the default) will preserve old names of existing datasets
  • dataset_preserve_organization – if True (the default) will preserve old organizations of existing datasets.
  • dataset_group_merge_strategy
    • ‘add’ add groups, keep old ones (default)
    • ‘replace’ replace all existing groups
    • ‘preserve’ leave groups alone
sync(source_name, data)[source]

Synchronize data from a source into Ckan.

  • datasets are matched by _harvest_source
  • groups and organizations are matched by name
Parameters:
  • source_name – String identifying the source of the data. Used to build ids that will be used in further synchronizations.
  • data – Data to be synchronized. Should be a dict (or dict-like) with top level keys coresponding to the object type, mapping to dictionaries of {'id': <object>}.

Modules

ckan_api_client.exceptions

Exceptions used all over the place

exception ckan_api_client.exceptions.HTTPError(status_code, message, original=None)[source]

Bases: exceptions.Exception

Exception representing an HTTP response error.

status_code

HTTP status code

message

Informative error message, if available

status_code
message
original
exception ckan_api_client.exceptions.BadApiError[source]

Bases: exceptions.Exception

Exception used to mark bad behavior from the API

exception ckan_api_client.exceptions.BadApiWarning[source]

Bases: exceptions.UserWarning

Warning to mark bad behavior from the API

exception ckan_api_client.exceptions.OperationFailure[source]

Bases: exceptions.Exception

Something went wrong // failed expectations somewhere..

ckan_api_client.objects

Base objects

Classes to represent / validate Ckan objects.

class ckan_api_client.objects.base.BaseField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: object

Pseudo-descriptor, accepting field names along with instance, to allow better retrieving data for the instance itself.

Warning

Beware that fields shouldn’t carry state of their own, a part from the one used for generic field configuration, as they are shared between instances.

default = None
is_key = False
get(instance, name)[source]

Get the value for the field from the main instace, by looking at the first found in:

  • the updated value
  • the initial value
  • the default value
get_default()[source]
validate(instance, name, value)[source]

The validate method should be the (updated) value to be used as the field value, or raise an exception in case it is not acceptable at all.

set_initial(instance, name, value)[source]

Set the initial value for a field

set(instance, name, value)[source]

Set the modified value for a field

delete(instance, name)[source]

Delete the modified value for a field (logically restores the original one)

serialize(instance, name)[source]

Returns the “serialized” (json-encodable) version of the object.

is_modified(instance, name)[source]

Check whether this field has been modified on the main instance.

is_equivalent(instance, name, other, ignore_key=True)[source]
class ckan_api_client.objects.base.BaseObject(values=None)[source]

Bases: object

Base for the other objects, dispatching get/set/deletes to BaseField instances, if available.

classmethod from_dict(data)[source]
set_initial(values)[source]

Set initial values for all fields

to_dict()[source]
serialize()[source]

Create a serializable representation of the object.

iter_fields()[source]

Iterate over fields in this objects, yielding (name, field) pairs.

is_equivalent(other, ignore_key=True)[source]

Equivalency check between objects. Will make sure that values in all the non-key fields match.

Parameters:
  • other – other object to compare
  • ignore_key – if set to True (the default), it will ignore “key” fields during comparison
compare(other)[source]

Compare differences between this object and another

is_modified()[source]

The object is modified if any of its fields reports itself as modified.

Base fields

class ckan_api_client.objects.fields.StringField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: ckan_api_client.objects.base.BaseField

default = None
validate(instance, name, value)[source]
class ckan_api_client.objects.fields.ListField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: ckan_api_client.objects.fields.MutableFieldMixin, ckan_api_client.objects.base.BaseField

static default()
validate(instance, name, value)[source]
class ckan_api_client.objects.fields.DictField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: ckan_api_client.objects.fields.MutableFieldMixin, ckan_api_client.objects.base.BaseField

static default()
validate(instance, name, value)[source]
class ckan_api_client.objects.fields.GroupsField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: ckan_api_client.objects.fields.SetField

validate(instance, name, value)[source]
class ckan_api_client.objects.fields.ExtrasField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: ckan_api_client.objects.fields.DictField

validate(instance, name, value)[source]
is_equivalent(instance, name, other, ignore_key=True)[source]

Ckan dataset / resource

class ckan_api_client.objects.ckan_dataset.ResourcesField(default=<object object>, is_key=<object object>, required=False)[source]

Bases: ckan_api_client.objects.fields.ListField

The ResourcesField should behave pretty much as a list field, but will keep track of changes, and make sure all elements are CkanResources.

validate(instance, name, value)[source]
serialize(instance, name)[source]
is_equivalent(instance, name, other, ignore_key=True)[source]
class ckan_api_client.objects.ckan_dataset.CkanDataset(values=None)[source]

Bases: ckan_api_client.objects.base.BaseObject

id = StringField(default=None, is_key=True, required=False)
name = StringField(default=None, is_key=False, required=False)
title = StringField(default=None, is_key=False, required=False)
author = StringField(default='', is_key=False, required=False)
author_email = StringField(default='', is_key=False, required=False)
license_id = StringField(default='', is_key=False, required=False)
maintainer = StringField(default='', is_key=False, required=False)
maintainer_email = StringField(default='', is_key=False, required=False)
notes = StringField(default='', is_key=False, required=False)
owner_org = StringField(default='', is_key=False, required=False)
private = BoolField(default=False, is_key=False, required=False)
state = StringField(default='active', is_key=False, required=False)
type = StringField(default='dataset', is_key=False, required=False)
url = StringField(default='', is_key=False, required=False)
extras = ExtrasField(default=<function <lambda>>, is_key=False, required=False)
groups = GroupsField(default=<function <lambda>>, is_key=False, required=False)
resources = ResourcesField(default=<function <lambda>>, is_key=False, required=False)
tags = SetField(default=<function <lambda>>, is_key=False, required=False)
class ckan_api_client.objects.ckan_dataset.CkanResource(values=None)[source]

Bases: ckan_api_client.objects.base.BaseObject

id = StringField(default=None, is_key=True, required=False)
description = StringField(default='', is_key=False, required=False)
format = StringField(default='', is_key=False, required=False)
mimetype = StringField(default=None, is_key=False, required=False)
mimetype_inner = StringField(default=None, is_key=False, required=False)
name = StringField(default='', is_key=False, required=False)
resource_type = StringField(default='', is_key=False, required=False)
size = StringField(default=None, is_key=False, required=False)
url = StringField(default='', is_key=False, required=False)
url_type = StringField(default=None, is_key=False, required=False)

Ckan group

class ckan_api_client.objects.ckan_group.CkanGroup(values=None)[source]

Bases: ckan_api_client.objects.base.BaseObject

id = StringField(default=None, is_key=True, required=False)
name = StringField(default=None, is_key=False, required=False)
title = StringField(default='', is_key=False, required=False)
approval_status = StringField(default='approved', is_key=False, required=False)
description = StringField(default='', is_key=False, required=False)
image_url = StringField(default='', is_key=False, required=False)
is_organization = BoolField(default=False, is_key=False, required=False)
state = StringField(default='active', is_key=False, required=False)
type = StringField(default='group', is_key=False, required=False)
extras = ExtrasField(default=<function <lambda>>, is_key=False, required=False)
groups = GroupsField(default=<function <lambda>>, is_key=False, required=False)
tags = ListField(default=<function <lambda>>, is_key=False, required=False)

Ckan organization

class ckan_api_client.objects.ckan_organization.CkanOrganization(values=None)[source]

Bases: ckan_api_client.objects.base.BaseObject

id = StringField(default=None, is_key=True, required=False)
name = StringField(default=None, is_key=False, required=False)
title = StringField(default='', is_key=False, required=False)
approval_status = StringField(default='approved', is_key=False, required=False)
description = StringField(default='', is_key=False, required=False)
image_url = StringField(default='', is_key=False, required=False)
is_organization = BoolField(default=True, is_key=False, required=False)
state = StringField(default='active', is_key=False, required=False)
type = StringField(default='organization', is_key=False, required=False)
extras = ExtrasField(default=<function <lambda>>, is_key=False, required=False)
groups = GroupsField(default=<function <lambda>>, is_key=False, required=False)
tags = ListField(default=<function <lambda>>, is_key=False, required=False)

ckan_api_client.utils

class ckan_api_client.utils.IDPair[source]

Bases: ckan_api_client.utils.IDPair

A pair (named tuple) mapping a “source” id with the one used internally in Ckan.

This is mostly used associated with IDMap.

Keys: source_id, ckan_id

class ckan_api_client.utils.SuppressExceptionIf(cond)[source]

Bases: object

Context manager used to suppress exceptions if they match a given condition.

Usage example:

is_404 = lambda x: isinstance(x, HTTPError) and x.status_code == 404
with SuppressExceptionIf(is_404):
    client.request(...)
class ckan_api_client.utils.IDMap[source]

Bases: object

Two-way hashmap to map source ids to ckan ids and the other way back.

to_ckan(source_id)[source]

Convert a source id to ckan id

to_source(ckan_id)[source]

Convert a ckan id to source id

add(pair)[source]

Add a new id pair

Parameters:pair (IDPair) – the id pair to be added
Raises:ValueError – if one of the two ids is found in a mismatching pair
remove(pair)[source]

Remove an id pair.

Parameters:pair (IDPair) – the id pair to be removed
Raises:ValueError – if one of the two ids is found in a mismatching pair
class ckan_api_client.utils.FrozenDict(*a, **kw)[source]

Bases: _abcoll.MutableMapping

Frozen dictionary. Acts as a read-only dictionary, preventing changes and returning frozen objects when asked for values.

class ckan_api_client.utils.FrozenSequence(data)[source]

Bases: _abcoll.Sequence

Base class for the FrozenList/FrozenTuple classes. Acts as a read-only sequence type, returning frozen versions of mutable objects.

class ckan_api_client.utils.FrozenList(data)[source]

Bases: ckan_api_client.utils.FrozenSequence

Immutable list-like.

class ckan_api_client.utils.FrozenTuple(data)[source]

Bases: ckan_api_client.utils.FrozenSequence

Immutable tuple-like.

ckan_api_client.utils.freeze(obj)[source]

Returns the “frozen” version of a mutable type.

Raises:TypeError – if a frozen version for that object doesn’t exist
class ckan_api_client.utils.WrappedList(*a, **kw)[source]

Bases: _abcoll.MutableSequence

insert(pos, item)[source]

Testing

All the testing is done via py.test. See the pages below on how to run and write tests.

Fixtures

Documentation of the available fixtures for tests.

Fixture functions

Utility objects

Utility functions

Functions used by fixtures.

Testing utilities

Data generation

ckan_api_client.tests.utils.generate.generate_organization()[source]

Generate a random organization object, with:

  • name, random, example: "org-abc123"
  • title, random, example: "Organization abc123"
  • description, random
  • image, url pointing to a random-generated pic
ckan_api_client.tests.utils.generate.generate_group()[source]

Generate a random group object, with:

  • name, random, example: "grp-abc123"
  • title, random, example: "Group abc123"
  • description, random
  • image, url pointing to a random-generated pic
ckan_api_client.tests.utils.generate.generate_dataset()[source]

Generate a dataset, populated with random data.

Fields:

  • name – random string, in the form dataset-{random}
  • title – random string, in the form Dataset {random}
  • author – random-generated name
  • author_email – random-generated email address
  • license_id – random license id. One of cc-by, cc-zero, cc-by-sa or notspecified.
  • maintainer – random-generated name
  • maintainer_email – random-generated email address
  • notes – random string, containing some markdown
  • owner_org – set to None
  • private – Fixed to False
  • tags – random list of tags (strings)
  • type – fixed string: "dataset"
  • url – random url of dataset on an “external source”
  • extras – dictionary containing random key / value pairs
  • groups – empty list
  • resources – list of random resources
  • relationships – empty list

Note

The owner_org and groups fields will be blank, as they must match with existing groups / organizations and we don’t have access to database from here (nor is it in the scope of this function!)

ckan_api_client.tests.utils.generate.generate_resource()[source]

Generate a random resource, to be put in a dataset.

Fields:

  • url – resource URL on an “external source”
  • resource_type – one of api or file
  • name – random-generated name
  • format – a random format (eg: csv, json)
  • description – random generated string
ckan_api_client.tests.utils.generate.generate_tags(amount)[source]

Generate amount random tags. Each tag is in the form tag-<random-int>.

Returns:a list of tag names
ckan_api_client.tests.utils.generate.generate_extras(amount)[source]

Generate a dict with amount random key/value pairs.

ckan_api_client.tests.utils.generate.generate_data(dataset_count=50, orgs_count=10, groups_count=15)[source]

Generate a bunch of random data. Will also associate datasets with random organizations / groups.

Returns:a dict with the dataset, organization and group keys; each of them a dict of {key: object}.
ckan_api_client.tests.utils.generate.generate_id(length=10)[source]

HTTP Utilities

Utilities for handling / checking HTTP responses

ckan_api_client.tests.utils.http.check_response_ok(response, status_code=200)[source]

Warning

deprecated function. Use check_api_v3_response().

ckan_api_client.tests.utils.http.check_response_error(response, status_code)[source]

Warning

deprecated function. Use check_api_v3_error().

ckan_api_client.tests.utils.http.check_api_v3_response(response, status_code=200)[source]

Make sure that response is a valid successful response from API v3.

  • check http status code to be in the 200-299 range
  • check http status code to match status_code
  • check content-type to be application/json
  • check charset to be utf-8
  • check content body to be valid json
  • make sure response object contains the success, result and help keys.
  • check that success is True
  • check that error key is not in the response
Parameters:
  • response – a requests response
  • status_code – http status code to be checked (default: 200)
ckan_api_client.tests.utils.http.check_api_v3_error(response, status_code)[source]

Make sure that response is a valid error response from API v3.

  • check http status code to match status_code
Parameters:
  • response – a requests response
  • status_code – http status code to be checked

String-related

String generation functions.

ckan_api_client.tests.utils.strings.generate_password(length=20)[source]

Generate random password of the given length.

Beware that the string will be generate as random data from urandom, and returned as headecimal string of twice the length.

ckan_api_client.tests.utils.strings.generate_random_alphanum(length=10)[source]

Generate a random string, made of ascii letters + digits

ckan_api_client.tests.utils.strings.gen_random_id(length=10)[source]

Generate a random id, made of lowercase ascii letters + digits

ckan_api_client.tests.utils.strings.gen_dataset_name()[source]

Generate a random dataset name

ckan_api_client.tests.utils.strings.gen_picture(s, size=200)[source]

Generate URL to picture from some text hash

ckan_api_client.tests.utils.strings.gen_gravatar(s, size=200)[source]

Return URL for gravatar of md5 of string s

ckan_api_client.tests.utils.strings.gen_robohash(s, size=200)[source]

Return URL for robohash pic for sha1 hash of string s

Validation

Utility functions to validate expectations

ckan_api_client.tests.utils.validation.check_dataset(dataset, expected)[source]

Make sure dataset matches the expected one

ckan_api_client.tests.utils.validation.check_group(group, expected)[source]

Make sure group matches the expected one

ckan_api_client.tests.utils.validation.check_organization(organization, expected)[source]

Make sure organization matches the expected one

Indices and tables