Wikidata client library for Python¶
This package provides easy APIs to use Wikidata for Python.
>>> from wikidata.client import Client
>>> client = Client() # doctest: +SKIP
>>> entity = client.get('Q20145', load=True)
>>> entity
<wikidata.entity.Entity Q20145 'IU'>
>>> entity.description
m'South Korean singer and actress'
>>> image_prop = client.get('P18')
>>> image = entity[image_prop]
>>> image
<wikidata.commonsmedia.File 'File:KBS "The Producers" press conference, 11 May 2015 10.jpg'>
>>> image.image_resolution
(820, 1122)
>>> image.image_url
'https://upload.wikimedia.org/wikipedia/commons/6/60/KBS_%22The_Producers%22_press_conference%2C_11_May_2015_10.jpg'
wikidata
— Wikidata client library¶
wikidata.cache
— Caching policies¶
Changed in version 0.5.0.
-
class
wikidata.cache.
CachePolicy
¶ Interface for caching policies.
-
get
(key: NewType.<locals>.new_type) → Optional[NewType.<locals>.new_type]¶ Look up a cached value by its
key
.Parameters: key ( CacheKey
) – The key string to look up a cached value.Returns: The cached value if it exists. None
if there’s no suchkey
.Return type: Optional
[CacheValue
]
-
set
(key: NewType.<locals>.new_type, value: Optional[NewType.<locals>.new_type]) → None¶ Create or update a cache.
Parameters: - key (
CacheKey
) – A key string to create or update. - value (
Optional
[CacheValue
]) – A value to cache.None
to remove cache.
- key (
-
-
wikidata.cache.
CacheValue
(x)¶ The type of cached values.
-
class
wikidata.cache.
MemoryCachePolicy
(max_size: int = 128)¶ LRU (least recently used) cache in memory.
Parameters: max_size ( int
) – The maximum number of values to cache. 128 by default.-
get
(key: NewType.<locals>.new_type) → Optional[NewType.<locals>.new_type]¶ Look up a cached value by its
key
.Parameters: key ( CacheKey
) – The key string to look up a cached value.Returns: The cached value if it exists. None
if there’s no suchkey
.Return type: Optional
[CacheValue
]
-
set
(key: NewType.<locals>.new_type, value: Optional[NewType.<locals>.new_type]) → None¶ Create or update a cache.
Parameters: - key (
CacheKey
) – A key string to create or update. - value (
Optional
[CacheValue
]) – A value to cache.None
to remove cache.
- key (
-
-
class
wikidata.cache.
NullCachePolicy
¶ No-op cache policy.
-
get
(key: NewType.<locals>.new_type) → Optional[NewType.<locals>.new_type]¶ Look up a cached value by its
key
.Parameters: key ( CacheKey
) – The key string to look up a cached value.Returns: The cached value if it exists. None
if there’s no suchkey
.Return type: Optional
[CacheValue
]
-
set
(key: NewType.<locals>.new_type, value: Optional[NewType.<locals>.new_type]) → None¶ Create or update a cache.
Parameters: - key (
CacheKey
) – A key string to create or update. - value (
Optional
[CacheValue
]) – A value to cache.None
to remove cache.
- key (
-
-
class
wikidata.cache.
ProxyCachePolicy
(cache_object, timeout: int, property_timeout: Optional[int] = None, namespace: str = 'wd_')¶ This proxy policy is a proxy or an adaptor to another cache object. Cache objects can be anything if they satisfy the following interface:
def get(key: str) -> Optional[bytes]: pass def set(key: str, value: bytes, timeout: int=0) -> None: pass def delete(key: str) -> None: pass
(The above methods omit
self
parameters.) It’s compatible with de facto interface for caching libraries in Python (e.g. python-memcached,werkzeug.contrib.cache
).Parameters: - cache_object – The cache object to adapt. Read the above explanation.
- timeout (
int
) – Lifespan of every cache in seconds. 0 means no expiration. - property_timeout (
int
) – Lifespan of caches for properties (in seconds). Since properties don’t change frequently or their changes usually don’t make important effect, longer lifespan of properties’ cache can be useful. 0 means no expiration. Set to the same astimeout
by default. - namespace (
str
) – The common prefix attached to every cache key.'wd_'
by default.
-
get
(key: NewType.<locals>.new_type) → Optional[NewType.<locals>.new_type]¶ Look up a cached value by its
key
.Parameters: key ( CacheKey
) – The key string to look up a cached value.Returns: The cached value if it exists. None
if there’s no suchkey
.Return type: Optional
[CacheValue
]
-
set
(key: NewType.<locals>.new_type, value: Optional[NewType.<locals>.new_type]) → None¶ Create or update a cache.
Parameters: - key (
CacheKey
) – A key string to create or update. - value (
Optional
[CacheValue
]) – A value to cache.None
to remove cache.
- key (
wikidata.client
— Client session¶
-
wikidata.client.
WIKIDATA_BASE_URL
= 'https://www.wikidata.org/'¶ (
str
) The defaultbase_url
ofClient
constructor.Changed in version 0.3.0: As the meaning of
Client
constructor’sbase_url
parameter, it now became tohttps://www.wikidata.org/
fromhttps://www.wikidata.org/wiki/
(which contained the trailing pathwiki/
).
-
class
wikidata.client.
Client
(base_url: str = 'https://www.wikidata.org/', opener: Optional[urllib.request.OpenerDirector] = None, datavalue_decoder: Union[Decoder, Callable[[Client, str, Mapping[str, object]], object], None] = None, entity_type_guess: bool = True, cache_policy: wikidata.cache.CachePolicy = <wikidata.cache.NullCachePolicy object>, repr_string: Optional[str] = None)¶ Wikidata client session.
Parameters: - base_url (
str
) – The base url of the Wikidata.WIKIDATA_BASE_URL
is used by default. - opener (
urllib.request.OpenerDirector
) – The opener forurllib.request
. If omitted orNone
the default opener is used. - entity_type_guess (
bool
) – Whether to guesstype
ofEntity
from itsid
for less HTTP requests.True
by default. - cache_poliy – A caching policy for API calls. No cache
(
NullCachePolicy
) by default.
New in version 0.5.0: The
cache_policy
option.Changed in version 0.3.0: The meaning of
base_url
parameter changed. It originally meanthttps://www.wikidata.org/wiki/
which contained the trailing pathwiki/
, but now it means onlyhttps://www.wikidata.org/
.New in version 0.2.0: The
entity_type_guess
option.-
cache_policy
= <wikidata.cache.NullCachePolicy object>¶ (
CachePolicy
) A caching policy for API calls.New in version 0.5.0.
-
datavalue_decoder
= None¶ (
Union
[Decoder
,Callable
[[Client
,str
,Mapping
[str
,object
]],object
]]) The function to decode the given datavalue. It’s typically an instance ofDecoder
or its subclass.
-
decode_datavalue
(datatype: str, datavalue: Mapping[str, object]) → object¶ Decode the given
datavalue
using the configureddatavalue_decoder
.New in version 0.3.0.
-
entity_type_guess
= True¶ (
bool
) Whether to guesstype
ofEntity
from itsid
for less HTTP requests.New in version 0.2.0.
-
get
(entity_id: NewType.<locals>.new_type, load: bool = False) → wikidata.entity.Entity¶ Get a Wikidata entity by its
EntityId
.Parameters: Returns: The found entity.
Return type: New in version 0.3.0: The
load
option.
-
guess_entity_type
(entity_id: NewType.<locals>.new_type) → Optional[wikidata.entity.EntityType]¶ Guess
EntityType
from the givenEntityId
. It could returnNone
when it fails to guess.Note
It always fails to guess when
entity_type_guess
is configued toFalse
.Returns: The guessed EntityId
, orNone
if it fails to guess.Return type: Optional
[EntityType
]New in version 0.2.0.
- base_url (
wikidata.commonsmedia
— Wikimedia Commons¶
New in version 0.3.0.
-
class
wikidata.commonsmedia.
File
(client: wikidata.client.Client, title: str)¶ Represent a file on Wikimedia Commons.
wikidata.datavalue
— Interpreting datavalues¶
This module provides the decoder interface for customizing how datavalues are
decoded, and the default Decoder
implementation.
Technically the interface is just a callable so that its implementation
doesn’t necessarily have to be an instance of Decoder
or its subclass,
but only need to satify:
typing.Callable[[wikidata.client.Client, str, typing.Mapping[str, object]],
object]
New in version 0.3.0.
-
exception
wikidata.datavalue.
DatavalueError
(*args)¶ Exception raised during decoding datavalues. It subclasses
ValueError
as well.-
datavalue
¶ The datavalue which caused the decoding error.
-
-
class
wikidata.datavalue.
Decoder
¶ Decode the given datavalue to a value of the appropriate Python type. For extensibility it uses visitor pattern and is intended to be subclassed. To customize decoding of datavalues subclass it and configure
datavalue_decoder
option ofClient
to the customized decoder.It automatically invokes an appropriate visitor method using a simple rule of name:
{datatype}__{datavalue[type]}
. For example, if the following call to adecoder
was made:decoder(client, 'mydatatype', {'type': 'mytype', 'value': '...'})
it’s delegated to the following visitor method call:
decoder.mydatatype__mytype(client, {‘type’: ‘mytype’, ‘value’: ‘…’})If a decoder failed to find a visitor method matched to
{datatype}__{datavalue[type]}
pattern it secondly try to find a general version of visitor method:{datavalue[type]}
which lacks double underscores. For example, for the following call:decoder(client, 'mydatatype', {'type': 'mytype', 'value': '...'})
It firstly try to find the following visitor method:
decoder.mydatatype__mytypebut if there’s no such method it secondly try to find the following general visitor method:
decoder.mytypeThis twice-try dispatch is useful when to make a visitor method to be matched regardless of datatype.
If its
datavalue[type]
contains hyphens they’re replaced by underscores. For example:decoder(client, 'string', {'type': 'wikibase-entityid', 'value': 'a text value'})
the above call is delegated to the following visitor method call:
decoder.string__wikibase_entityid( # Note that the ^ underscore client, {'type': 'wikibase-entityid', 'value': 'a text value'} )
wikidata.entity
— Wikidata entities¶
-
class
wikidata.entity.
Entity
(id: NewType.<locals>.new_type, client: Client)¶ Wikidata entity. Can be an item or a property. Its attrributes can be lazily loaded.
To get an entity use
Client.get()
method instead of the constructor ofEntity
.Note
Although it implements
Mapping
[EntityId
,object
], it actually is multidict. See alsogetlist()
method.Changed in version 0.2.0: Implemented
Mapping
[EntityId
,object
] protocol for easy access of statement values.Changed in version 0.2.0: Implemented
Hashable
protocol and==
/=
operators for equality test.-
state
¶ (
EntityState
) The loading state.New in version 0.7.0.
-
getlist
(key: wikidata.entity.Entity) → Sequence[object]¶ Return all values associated to the given
key
property in sequence.Parameters: key ( Entity
) – The property entity.Returns: A sequence of all values associated to the given key
property. It can be empty if nothing is associated to the property.Return type: Sequence
[object
]
-
lists
() → Sequence[Tuple[wikidata.entity.Entity, Sequence[object]]]¶ Similar to
items()
except the returning pairs have each list of values instead of each single value.Returns: The pairs of (key, values) where values is a sequence. Return type: Sequence
[Tuple
[Entity
,Sequence
[object
]]]
-
type
¶ (
EntityType
) The type of entity,item
orproperty
.New in version 0.2.0.
-
-
class
wikidata.entity.
EntityState
¶ Define state of
Entity
.New in version 0.7.0.
-
loaded
= 'loaded'¶ (
EntityState
) The entity exists and is already loaded.
-
non_existent
= 'non_existent'¶ (
EntityState
) The entity does not exist.
-
not_loaded
= 'not_loaded'¶ (
EntityState
) Not loaded yet. Unknown whether the entity does exist or not.
-
-
class
wikidata.entity.
EntityType
¶ The enumerated type which consists of two possible values:
New in version 0.2.0.
-
item
= 'item'¶ (
EntityType
) Items areEntity
objects that are typically represented by Wikipage (at least in some Wikipedia languages). They can be viewed as “the thing that a Wikipage is about,” which could be an individual thing (the person Albert Einstein), a general class of things (the class of all Physicists), and any other concept that is the subject of some Wikipedia page (including things like History of Berlin).See also
- Items — Wikibase Data Model
- The data model of Wikibase describes the structure of the data that is handled in Wikibase.
-
property
= 'property'¶ (
EntityType
) Properties areEntity
objects that describe a relationship between items (or otherEntity
objects) and values of the property. Typical properties are population (using numbers as values), binomial name (using strings as values), but also has father and author of (both using items as values).See also
- Properties — Wikibase Data Model
- The data model of Wikibase describes the structure of the data that is handled in Wikibase.
-
wikidata.globecoordinate
— Globe coordinate¶
New in version 0.7.0.
-
class
wikidata.globecoordinate.
GlobeCoordinate
(latitude: float, longitude: float, globe: wikidata.entity.Entity, precision: float)¶ Literal data for a geographical position given as a latitude-longitude pair in gms or decimal degrees for the given stellar body.
wikidata.multilingual
— Multilingual texts¶
-
wikidata.multilingual.
Locale
(x)¶ The locale of each
MonolingualText
or internal mapping of eachMultilingualText
. Alias ofstr
.New in version 0.7.0.
wikidata.quantity
— Quantity¶
New in version 0.7.0.
-
class
wikidata.quantity.
Quantity
(amount: float, lower_bound: Optional[float], upper_bound: Optional[float], unit: Optional[wikidata.entity.Entity])¶ A Quantity value represents a decimal number, together with information about the uncertainty interval of this number, and a unit of measurement.
Contributing¶
How to run tests¶
As this project supports various Python interpreters (CPython and PyPy) and
versions, to ensure it works well with them, we use tox. You don’t need to
create a virtual environment by yourself. tox
automatically creates
virtual environments for various Python versions and run the same test suite
on all of them.
The easiest to install tox
is to use pip
[1]:
pip install tox
Once you’ve installed tox
, it’s very simple to run the test suite on
all Python versions this project aims to support:
tox
Note that you need to install Python interpreters besides tox
.
If you don’t want to install all of them use --skip-missing-interpreters
option:
tox --skip-missing-interpreters
To run tests on multiple interpreters at a time, use --parallel
option:
tox --parallel
[1] | See also the tox’s official docs. |
Changelog¶
Version 0.7.0¶
Released on July 31, 2020.
Marked the package as supporting type checking by following PEP 561.
Now non-existent entities became able to be handled. [#11]
- Added
EntityState
enum class. - Added
Entity.state
attribute. - Fixed a bug that raised
HTTPError
when non-existentEntity
was requested.
- Added
Languages (locales) became no more represented as
babel.core.Locale
, but representedwikidata.multilingual.Locale
instead. [#2, #27, #30 by Nelson Liu]Removed Babel from the dependencies.
Added
wikidata.multilingual.Locale
type.To replace the
babel.core.Locale
type, thewikidata.multilingual.Locale
type has been aliased to str. This is a breaking change for all Wikidata public API functions that formerly returned or ingestedbabel.core.Locale
.
Added support for
time
datatypes with precision 9 (year-only). [#26 by Nelson Liu]Added support for globe coordinate datatype. [#28 by Nelson Liu]
- Added support for decoding the
globe-coordinate
datatype. - Added
wikidata.globecoordinate
module.
- Added support for decoding the
Added support for quantity datatype. [#29 by Nelson Liu]
- Added support for decoding the
quantity
datatype. - Added
wikidata.quantity
module. [#29]
- Added support for decoding the
Fixed
KeyError
fromEntity.getlist()
if the property is explicitly associated with “no value”. [#18]Fixed a bug that raised
KeyError
when accessing an image more than once andMemoryCachePolicy
was enabled. [#24 by Héctor Cordobés]
Version 0.6.1¶
Released on September 18, 2017.
- Fixed
ImportError
on Python 3.4 due to lack oftyping
module. [#4]
Version 0.6.0¶
Released on September 12, 2017.
- Fixed
KeyError
fromClient.get()
on an entity is redirected to its canonical entity.
Version 0.5.4¶
Released on September 18, 2017.
- Fixed
ImportError
on Python 3.4 due to lack oftyping
module. [#4]
Version 0.5.3¶
Released on June 30, 2017.
Fixed
ValueError
fromEntity.label
/Entity.description
with languages ISO 639-1 doesn’t cover (e.g.cbk-zam
). [#2]Although this fix prevents these properties from raising
ValueError
, it doesn’t completely fix the problem.babel.core.Locale
type, which Wikidata depends on, currently doesn’t supprot languages other than ISO 639-1. In order to completely fix the problem, we need to patch Babel to support them, or make Wikidata independent from Babel.
Version 0.5.1¶
Released on June 28, 2017.
- Fixed
AssertionError
fromlen()
or iterating (iter()
) onEntity
objects with empty claims.
Version 0.5.0¶
Released on June 13, 2017.
Wikidata API calls over network became possible to be cached.
Client
now hascache_policy
attribute and constructor option. Nothing is cached by default.Added
wikidata.cache
module andCachePolicy
interface in it. Two built-in implementation of the interface were added:NullCachePolicy
No-op.
MemoryCachePolicy
LRU cache in memory.
ProxyCachePolicy
Proxy/adapter to another proxy object. Useful for utilizing third-party cache libraries.
wikidata.client.Client.request
logger became to record logs about cache hits asDEBUG
level.
Version 0.4.4¶
Released on June 30, 2017.
Fixed
ValueError
fromEntity.label
/Entity.description
with languages ISO 639-1 doesn’t cover (e.g.cbk-zam
). [#2]Although this fix prevents these properties from raising
ValueError
, it doesn’t completely fix the problem.babel.core.Locale
type, which Wikidata depends on, currently doesn’t supprot languages other than ISO 639-1. In order to completely fix the problem, we need to patch Babel to support them, or make Wikidata independent from Babel.
Version 0.4.2¶
Released on June 28, 2017.
- Fixed
AssertionError
fromlen()
or iterating (iter()
) onEntity
objects with empty claims.
Version 0.4.1¶
Released on April 30, 2017.
- Fixed
AssertionError
fromgetlist()
on entities with empty claims.
Version 0.4.0¶
Released on April 24, 2017.
- Monolingual texts became able to be handled.
- Added
MonolingualText
type which is a true subtype ofstr
.
- Added
Version 0.3.0¶
Released on February 23, 2017.
- Now
Client
became able to customize how it decodes datavalues to Python objects.- Added
wikidata.datavalue
module andDecoder
class inside it. - Added
datavalue_decoder
option toClient
.
- Added
- Now files on Wikimeda Commons became able to be handled.
- New decoder became able to parse Wikimedia Commons files e.g. images.
- Added
wikidata.commonsmedia
module andFile
class inside it.
- The meaning of
Client
constructor’sbase_url
prameter beccame not to contain the trailing pathwiki/
fromhttps://www.wikidata.org/wiki/
. As its meaning changed, the value ofWIKIDATA_BASE_URL
constant also changed to not have the trailing path. - Added
load
option toClient.get()
method.
Version 0.2.0¶
Released on February 19, 2017.
- Made
Entity
multidict. Now it satisfiesMapping
[Entity
,object
] protocol. - Added
Entity.type
property andEntityType
enum class to represent it. - Added
entity_type_guess
option andguess_entity_type()
method toClient
class. - Implemented
Hashable
protocol and==
/=
operators toEntity
for equality test.
Version 0.1.0¶
Initial version. Released on February 15, 2017.