jsonstat.py

jsonstat.py is a library for reading the JSON-stat data format maintained and promoted by Xavier Badosa. The JSON-stat format is a JSON format for publishing dataset. JSON-stat is used by several institutions to publish statistical data.

Contents:

Notebooks

Notebook: using jsonstat.py python library with jsonstat format version 1.

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site. This example shows how to explore the example data file oecd-canada from json-stat.org site. This file is compliant to the version 1 of jsonstat.

# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat     # import jsonstat.py package

import matplotlib as plt  # for plotting
%matplotlib inline

Download or use cached file oecd-canada.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.

url = 'http://json-stat.org/samples/oecd-canada.json'
file_name = "oecd-canada.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
    print("using already downloaded file {}".format(file_path))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name)
    file_path = file_name
using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada.json

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.

collection = jsonstat.from_file(file_path)
collection
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'oecd'
1'canada'

Select the dataset named oedc. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.

oecd = collection.dataset('oecd')
oecd
name: 'oecd'
label: 'Unemployment rate in the OECD countries 2003-2014'
source: 'Unemployment rate in the OECD countries 2003-2014'
size: 3
posidlabelsizerole
0conceptindicator1metric
1areaOECD countries, EU15 and total36geo
2year2003-201412time

Shows some detailed info about dimensions

oecd.dimension('concept')
posidxlabel
0'UNR''unemployment rate'
oecd.dimension('area')
posidxlabel
0'AU''Australia'
1'AT''Austria'
2'BE''Belgium'
3'CA''Canada'
.........
oecd.dimension('year')
posidxlabel
0'2003'''
1'2004'''
2'2005'''
3'2006'''
.........

Accessing value in the dataset

Print the value in oecd dataset for area = IT and year = 2012

oecd.data(area='IT', year='2012')
JsonStatValue(idx=201, value=10.55546863, status=None)
oecd.value(area='IT', year='2012')
10.55546863
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128
5.39663128
oecd.value(concept='UNR',area='AU',year='2004')
5.39663128

Trasforming dataset into pandas DataFrame

df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()
concept area Value
year
2003 UNR AU 5.943826
2004 UNR AU 5.396631
2005 UNR AU 5.044791
2006 UNR AU 4.789363
2007 UNR AU 4.379649
df_oecd['area'].describe() # area contains 36 values
count     432
unique     36
top        JP
freq       12
Name: area, dtype: object

Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)

df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()
concept area Value
year
2010 UNR CA 7.988900
2011 UNR CA 7.453610
2012 UNR CA 7.323584
2013 UNR CA 7.169742
2014 UNR CA 6.881227
df_oecd_ca['area'].describe()  # area contains only one value (CA)
count     12
unique     1
top       CA
freq      12
Name: area, dtype: object
df_oecd_ca.plot(grid=True)
<matplotlib.axes._subplots.AxesSubplot at 0x113980908>
_images/oecd-canada-jsonstat_v1_24_1.png

Trasforming a dataset into a python list

oecd.to_table()[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Australia', '2004', 5.39663128],
 ['unemployment rate', 'Australia', '2005', 5.044790587],
 ['unemployment rate', 'Australia', '2006', 4.789362794]]

It is possible to trasform jsonstat data into table in different order

order = [i.did() for i in oecd.dimensions()]
order = order[::-1]  # reverse list
table = oecd.to_table(order=order)
table[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Austria', '2003', 4.278559338],
 ['unemployment rate', 'Belgium', '2003', 8.158333333],
 ['unemployment rate', 'Canada', '2003', 7.594616751]]

Notebook: using jsonstat.py python library with jsonstat format version 2.

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site.

In this notebook it is used the data file oecd-canada-col.json from json-stat.org site. This file is compliant to the version 2 of jsonstat. This notebook is equal to version 1. The only difference is the datasource.

# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat     # import jsonstat.py package

import matplotlib as plt  # for plotting
%matplotlib inline

Download or use cached file oecd-canada-col.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.

url = 'http://json-stat.org/samples/oecd-canada-col.json'
file_name = "oecd-canada-col.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
    print("using already downloaded file {}".format(file_path))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name)
    file_path = file_name
using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada-col.json

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.

collection = jsonstat.from_file(file_path)
collection
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'Unemployment rate in the OECD countries 2003-2014'
1'Population by sex and age group. Canada. 2012'

Select the firt dataset. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.

oecd = collection.dataset(0)
oecd
name: 'Unemployment rate in the OECD countries 2003-2014'
label: 'Unemployment rate in the OECD countries 2003-2014'
size: 3
posidlabelsizerole
0conceptindicator1metric
1areaOECD countries, EU15 and total36geo
2year2003-201412time
oecd.dimension('concept')
posidxlabel
0'UNR''unemployment rate'
oecd.dimension('area')
posidxlabel
0'AU''Australia'
1'AT''Austria'
2'BE''Belgium'
3'CA''Canada'
.........
oecd.dimension('year')
posidxlabel
0'2003'''
1'2004'''
2'2005'''
3'2006'''
.........

Shows some detailed info about dimensions.

Accessing value in the dataset

Print the value in oecd dataset for area = IT and year = 2012

oecd.data(area='IT', year='2012')
JsonStatValue(idx=201, value=10.55546863, status=None)
oecd.value(area='IT', year='2012')
10.55546863
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128
5.39663128
oecd.value(concept='UNR',area='AU',year='2004')
5.39663128

Trasforming dataset into pandas DataFrame

df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()
concept area Value
year
2003 UNR AU 5.943826
2004 UNR AU 5.396631
2005 UNR AU 5.044791
2006 UNR AU 4.789363
2007 UNR AU 4.379649
df_oecd['area'].describe() # area contains 36 values
count     432
unique     36
top        ES
freq       12
Name: area, dtype: object

Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)

df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()
concept area Value
year
2010 UNR CA 7.988900
2011 UNR CA 7.453610
2012 UNR CA 7.323584
2013 UNR CA 7.169742
2014 UNR CA 6.881227
df_oecd_ca['area'].describe()  # area contains only one value (CA)
count     12
unique     1
top       CA
freq      12
Name: area, dtype: object
df_oecd_ca.plot(grid=True)
<matplotlib.axes._subplots.AxesSubplot at 0x114298198>
_images/oecd-canada-jsonstat_v2_23_1.png

Trasforming a dataset into a python list

oecd.to_table()[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Australia', '2004', 5.39663128],
 ['unemployment rate', 'Australia', '2005', 5.044790587],
 ['unemployment rate', 'Australia', '2006', 4.789362794]]

It is possible to trasform jsonstat data into table in different order

order = [i.did() for i in oecd.dimensions()]
order = order[::-1]  # reverse list
table = oecd.to_table(order=order)
table[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Austria', '2003', 4.278559338],
 ['unemployment rate', 'Belgium', '2003', 8.158333333],
 ['unemployment rate', 'Canada', '2003', 7.594616751]]

Notebook: using jsonstat.py with eurostat api

This Jupyter notebook shows the python library jsonstat.py in action. It shows how to explore dataset downloaded from a data provider. This notebook uses some datasets from Eurostat. Eurostat provides a rest api to download its datasets. You can find details about the api here It is possible to use a query builder for discovering the rest api parameters. The following image shows the query builder:

# all import here
from __future__ import print_function
import os
import pandas as pd
import jsonstat

import matplotlib as plt
%matplotlib inline

1 - Exploring data with one dimension (time) with size > 1

Following cell downloads a datataset from eurostat. If the file is already downloaded use the copy presents on the disk. Caching file is useful to avoid downloading dataset every time notebook runs. Caching can speed the development, and provides consistent results. You can see the raw data here

url_1 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c?precision=1&geo=IT&unit=EUR_HAB&indic_na=B1GM'
file_name_1 = "eurostat-name_gpd_c-geo_IT.json"

file_path_1 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa.eu_eurostat", file_name_1))
if os.path.exists(file_path_1):
    print("using already donwloaded file {}".format(file_path_1))
else:
    print("download file")
    jsonstat.download(url_1, file_name_1)
    file_path_1 = file_name_1
using already donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT.json

Initialize JsonStatCollection with eurostat data and print some info about the collection.

collection_1 = jsonstat.from_file(file_path_1)
collection_1
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'nama_gdp_c'

Previous collection contains only a dataset named ‘nama_gdp_c

nama_gdp_c_1 = collection_1.dataset('nama_gdp_c')
nama_gdp_c_1
name: 'nama_gdp_c'
title: 'GDP and main components - Current prices'
size: 4
posidlabelsizerole
0unitunit1
1indic_naindic_na1
2geogeo1
3timetime69

All dimensions of the dataset ‘nama_gdp_c‘ are of size 1 with exception of time dimension. Let’s explore the time dimension.

nama_gdp_c_1.dimension('time')
posidxlabel
0'1946''1946'
1'1947''1947'
2'1948''1948'
3'1949''1949'
.........

Get value for year 2012.

nama_gdp_c_1.value(time='2012')
25700

Convert the jsonstat data into a pandas dataframe.

df_1 = nama_gdp_c_1.to_data_frame('time', content='id')
df_1.tail()
unit indic_na geo Value
time
2010 EUR_HAB B1GM IT 25700.0
2011 EUR_HAB B1GM IT 26000.0
2012 EUR_HAB B1GM IT 25700.0
2013 EUR_HAB B1GM IT 25600.0
2014 EUR_HAB B1GM IT NaN

Adding a simple plot

df_1 = df_1.dropna() # remove rows with NaN values
df_1.plot(grid=True, figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x114bc12b0>
_images/eurostat_15_1.png

2 - Exploring data with two dimensions (geo, time) with size > 1

Download or use the jsonstat file cached on disk. The cache is used to avoid internet download during the devolopment to make the things a bit faster. You can see the raw data here

url_2 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c?precision=1&geo=IT&geo=FR&unit=EUR_HAB&indic_na=B1GM'
file_name_2 = "eurostat-name_gpd_c-geo_IT_FR.json"

file_path_2 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa.eu_eurostat", file_name_2))
if os.path.exists(file_path_2):
    print("using alredy donwloaded file {}".format(file_path_2))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name_2)
    file_path_2 = file_name_2
using alredy donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT_FR.json
collection_2 = jsonstat.from_file(file_path_2)
nama_gdp_c_2 = collection_2.dataset('nama_gdp_c')
nama_gdp_c_2
name: 'nama_gdp_c'
title: 'GDP and main components - Current prices'
size: 4
posidlabelsizerole
0unitunit1
1indic_naindic_na1
2geogeo2
3timetime69
nama_gdp_c_2.dimension('geo')
posidxlabel
0'FR''France'
1'IT''Italy'
nama_gdp_c_2.value(time='2012',geo='IT')
25700
nama_gdp_c_2.value(time='2012',geo='FR')
31100
df_2 = nama_gdp_c_2.to_table(content='id',rtype=pd.DataFrame)
df_2.tail()
unit indic_na geo time Value
133 EUR_HAB B1GM IT 2010 25700.0
134 EUR_HAB B1GM IT 2011 26000.0
135 EUR_HAB B1GM IT 2012 25700.0
136 EUR_HAB B1GM IT 2013 25600.0
137 EUR_HAB B1GM IT 2014 NaN
df_FR_IT = df_2.dropna()[['time', 'geo', 'Value']]
df_FR_IT = df_FR_IT.pivot('time', 'geo', 'Value')
df_FR_IT.plot(grid=True, figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x114c0f0b8>
_images/eurostat_23_1.png
df_3 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'FR'})
df_3 = df_3.dropna()
df_3.plot(grid=True,figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x1178e7d30>
_images/eurostat_24_1.png
df_4 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'IT'})
df_4 = df_4.dropna()
df_4.plot(grid=True,figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x117947630>
_images/eurostat_25_1.png

Notebook: using jsonstat.py to explore ISTAT data (house price index)

This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is Italian National Institute of Statistics. It publishs a rest api for querying italian statistics.

We starts importing some modules.

from __future__ import print_function
import os
import istat
from IPython.core.display import HTML

Step 1: using istat module to get a jsonstat collection

Following code sets a cache dir where to store json files download by Istat api. Storing file on disk speed up development, and assures consistent results over time. Anyway you can delete file to donwload a fresh copy.

cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached"))
istat.cache_dir(cache_dir)
print("cache_dir is '{}'".format(istat.cache_dir()))
cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'

Using istat api, we can shows the istat areas used to categorize the datasets

istat.areas()
iddesc
32011 Population and housing census
4Enterprises
7Environment and Energy
8Population and Households
9Households Economic Conditions and Disparities
10Health statistics
11Social Security and Welfare
12Education and training
13Communication, culture and leisure
14Justice and Security
15Citizens' opinions and satisfaction with life
16Social participation
17National Accounts
19Agriculture
20Industry and Construction
21Services
22Public Administrations and Private Institutions
24External Trade and Internationalisation
25Prices
26Labour

Following code list all datasets contained into area Prices.

istat_area_prices = istat.area('Prices')
istat_area_prices.datasets()
codnamedim
DCSC_FABBRESID_1Construction costs index - monthly data5
DCSC_PREZPRODSERV_1Services producer prices index5
DCSC_PREZZPIND_1Producer price index for industrial products - monthly data6
DCSP_FOI1FOI – Monthly data until 20105
DCSP_FOI1B2010FOI - Monthly data from 2011 to 20155
DCSP_FOI1B2015FOI - Monthly data from 2016 onwards5
DCSP_FOI2FOI – Annual average until 20105
DCSP_FOI2B2010FOI – Annual average from 2011 onwards5
DCSP_FOI2B2015FOI - Annual average from 2016 onwards5
DCSP_FOI3FOI – Weights until 20104
DCSP_FOI3B2010FOI - Weights from 2011 to 20154
DCSP_FOI3B2015FOI - Weights from 2016 onwards4
DCSP_IPABHouse price index 5
DCSP_IPCA1HICP - Monthly data from 2001 to 2015 (base 2005=100)5
DCSP_IPCA1B2015HICP - Monthly data from 2001 onwards (base 2015=100)5
DCSP_IPCA2HICP - Annual average from 2001 to 2015 (base 2005=100)5
DCSP_IPCA2B2015HICP - Annual average from 2001 onwards (base 2015=100)5
DCSP_IPCA3HICP – Weights from 2001 onwards4
DCSP_IPCATC1HICP at constant tax rates - Monthly data from 2002 to 2015 (base 2005=100)5
DCSP_IPCATC1B2015HICP at constant tax rates - Monthly data from 2002 onwards (base 2015=100)5
DCSP_IPCATC2HICP at constant tax rates - Annual average from 2002 to 2015 (base 2005=100)5
DCSP_IPCATC2B2015HICP at constant tax rates - Annual average from 2002 onwards (base 2015=100)5
DCSP_NIC1B2015NIC - Monthly data from 2016 onwards5
DCSP_NIC3B2015NIC - Weights from 2016 onwards4
DCSP_NICDUENIC – Annual average until 20105
DCSP_NICDUEB2010NIC – Annual average from 2011 onwards5
DCSP_NICTRENIC – Weights until 20104
DCSP_NICTREB2010NIC - Weights from 2011 to 20154
DCSP_NICUNOBNIC – Monthly data until 20105
DCSP_NICUNOBB2010NIC - Monthly data from 2011 to 20155

List all dimension for dataset DCSP_IPAB (House price index)

istat_dataset_dcsp_ipab = istat_area_prices.dataset('DCSP_IPAB')
istat_dataset_dcsp_ipab
DCSP_IPAB(5):House price index
nrnamenr. valuesvalues (first 3 values)
0Territory11:'Italy'
1Index type318:'house price index (base 2010=100) - quarterly data', 19:'house price index (base 2010=100) - annual average', 20:'house price index (base 2010=100) - weights' ...
2Measure58:'annual average rate of change', 4:'index number', 22:'not applicable' ...
3Purchases of dwellings34:'H1 - all items', 5:'H11 - new dwellings', 6:'H12 - existing dwellings' ...
4Time and frequency292112:'Q1-2011', 2178:'Q3-2014', 2116:'Q2-2011' ...

Finally from istat dataset we extracts data in jsonstat format by specifying dimensions we are interested.

spec = {
    "Territory": 1, "Index type": 18,
    # "Measure": 0, # "Purchases of dwelling": 0, # "Time and frequency": 0
}
# convert istat dataset into jsonstat collection and print some info
collection = istat_dataset_dcsp_ipab.getvalues(spec)
collection
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'IDMISURA1*IDTYPPURCH*IDTIME'

The previous call is equivalent to call istat api with a “1,18,0,0,0” string of number. Below is the mapping from the number and dimensions:

dimension    
Territory 1 Italy
Type 18 house price index (base 2010=100) - quarterly data’
Measure 0 ALL
Purchase of dwelling 0 ALL
Time and frequency 0 ALL
json_stat_data = istat_dataset_dcsp_ipab.getvalues("1,18,0,0,0")
json_stat_data
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'IDMISURA1*IDTYPPURCH*IDTIME'

step2: using jsonstat.py api.

Now we have a jsonstat collection, let expore it with the api of jsonstat.py

Print some info of one dataset contained into the above jsonstat collection

jsonstat_dataset = collection.dataset('IDMISURA1*IDTYPPURCH*IDTIME')
jsonstat_dataset
name: 'IDMISURA1*IDTYPPURCH*IDTIME'
label: 'House price index by Measure, Purchases of dwellings and Time and frequency - Italy - house price index (base 2010=100) - quarterly data'
size: 207
posidlabelsizerole
0IDMISURA1Measure3
1IDTYPPURCHPurchases of dwellings3
2IDTIMETime and frequency23

Print info about the dimensions to get an idea about the data

jsonstat_dataset.dimension('IDMISURA1')
posidxlabel
0'4''index number'
1'6''percentage changes on the previous period'
2'7''percentage changes on the same period of the previous year'
jsonstat_dataset.dimension('IDTYPPURCH')
posidxlabel
0'4''H1 - all items'
1'5''H11 - new dwellings'
2'6''H12 - existing dwellings'
jsonstat_dataset.dimension('IDTIME')
posidxlabel
0'2093''Q1-2010'
1'2097''Q2-2010'
2'2102''Q3-2010'
3'2106''Q4-2010'
.........
import pandas as pd
df = jsonstat_dataset.to_table(rtype=pd.DataFrame)
df.head()
Measure Purchases of dwellings Time and frequency Value
0 index number H1 - all items Q1-2010 99.5
1 index number H1 - all items Q2-2010 100.0
2 index number H1 - all items Q3-2010 100.3
3 index number H1 - all items Q4-2010 100.2
4 index number H1 - all items Q1-2011 100.1
filtered = df.loc[
    (df['Measure'] == 'index number') & (df['Purchases of dwellings'] == 'H1 - all items'),
    ['Time and frequency', 'Value']
]
filtered.set_index('Time and frequency')
Value
Time and frequency
Q1-2010 99.5
Q2-2010 100.0
Q3-2010 100.3
Q4-2010 100.2
Q1-2011 100.1
Q2-2011 101.2
Q3-2011 101.2
Q4-2011 100.5
Q1-2012 99.9
Q2-2012 99.1
Q3-2012 97.4
Q4-2012 95.3
Q1-2013 93.9
Q2-2013 93.3
Q3-2013 91.9
Q4-2013 90.2
Q1-2014 89.3
Q2-2014 88.7
Q3-2014 88.3
Q4-2014 86.9
Q1-2015 86.1
Q2-2015 86.1
Q3-2015 86.3
%matplotlib inline
import matplotlib.pyplot as plt

values = filtered['Value'].tolist()
labels = filtered['Time and frequency']

xs = [i + 0.1 for i, _ in enumerate(values)]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates
# so that each bar is centered

# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.figure(figsize=(15,4))
plt.bar(xs, values)
plt.ylabel("value")
plt.title("house index")

# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(labels)], labels, rotation='vertical')
plt.show()
_images/istat_house_price_index_25_0.png

Notebook: using jsonstat.py to explore ISTAT data (unemployment)

This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is the Italian National Institute of Statistics. It publishes a rest api for browsing italian statistics. This api can return results in jsonstat format.

from __future__ import print_function
import os
import pandas as pd
from IPython.core.display import HTML
import matplotlib.pyplot as plt
%matplotlib inline

import istat

Using istat api

Next step is to set a cache dir where to store json files downloaded from Istat. Storing file on disk speeds up development, and assures consistent results over time. Eventually, you can delete donwloaded files to get a fresh copy.

cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached")) # you could choice /tmp
istat.cache_dir(cache_dir)
print("cache_dir is '{}'".format(istat.cache_dir()))
cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'

List all istat areas

istat.areas()
iddesc
32011 Population and housing census
4Enterprises
7Environment and Energy
8Population and Households
9Households Economic Conditions and Disparities
10Health statistics
11Social Security and Welfare
12Education and training
13Communication, culture and leisure
14Justice and Security
15Citizens' opinions and satisfaction with life
16Social participation
17National Accounts
19Agriculture
20Industry and Construction
21Services
22Public Administrations and Private Institutions
24External Trade and Internationalisation
25Prices
26Labour

List all datasets contained into area LAB (Labour)

istat_area_lab = istat.area('LAB')
istat_area_lab
IstatArea: cod = LAB description = Labour

List all dimension for dataset DCCV_TAXDISOCCU (Unemployment rate)

istat_dataset_taxdisoccu = istat_area_lab.dataset('DCCV_TAXDISOCCU')
istat_dataset_taxdisoccu
DCCV_TAXDISOCCU(9):Unemployment rate
nrnamenr. valuesvalues (first 3 values)
0Territory1361:'Italy', 3:'Nord', 4:'Nord-ovest' ...
1Data type16:'unemployment rate'
2Measure11:'percentage values'
3Gender31:'males', 2:'females', 3:'total' ...
4Age class1432:'18-29 years', 3:'20-24 years', 4:'15-24 years' ...
5Highest level of education attained511:'tertiary (university, doctoral and specialization courses)', 12:'total', 3:'primary school certificate, no educational degree' ...
6Citizenship31:'italian', 2:'foreign', 3:'total' ...
7Duration of unemployment22:'12 months and more', 3:'total'
8Time and frequency1931536:'Q4-1980', 2049:'Q4-2007', 1540:'1981' ...

Extract data from dataset DCCV_TAXDISOCCU

spec = {
    "Territory": 0,                            # 1 Italy
    "Data type": 6,                            # (6:'unemployment rate')
    'Measure': 1,                              # 1 : 'percentage values'
    'Gender': 3,                               # 3 total
    'Age class':31,                            # 31:'15-74 years'
    'Highest level of education attained': 12, # 12:'total',
    'Citizenship': 3,                          # 3:'total')
    'Duration of unemployment': 3,             # 3:'total'
    'Time and frequency': 0                    # All
}

# convert istat dataset into jsonstat collection and print some info
collection = istat_dataset_taxdisoccu.getvalues(spec)
collection
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'IDITTER107*IDTIME'

Print some info of the only dataset contained into the above jsonstat collection

jsonstat_dataset = collection.dataset(0)
jsonstat_dataset
name: 'IDITTER107*IDTIME'
label: 'Unemployment rate by Territory and Time and frequency - unemployment rate - percentage values - 15-74 years'
size: 7830
posidlabelsizerole
0IDITTER107Territory135
1IDTIMETime and frequency58
df_all = jsonstat_dataset.to_table(rtype=pd.DataFrame)
df_all.head()
Territory Time and frequency Value
0 Italy 2004 8.01
1 Italy Q1-2004 8.68
2 Italy Q2-2004 7.88
3 Italy Q3-2004 7.33
4 Italy Q4-2004 8.17
df_all.pivot('Territory', 'Time and frequency', 'Value').head()
Time and frequency 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 ... Q4-2005 Q4-2006 Q4-2007 Q4-2008 Q4-2009 Q4-2010 Q4-2011 Q4-2012 Q4-2013 Q4-2014
Territory
Abruzzo 7.71 7.88 6.57 6.17 6.63 7.97 8.67 8.59 10.85 11.29 ... 6.95 6.84 5.87 6.67 7.02 9.15 9.48 10.48 11.21 12.08
Agrigento 20.18 17.62 13.40 16.91 16.72 17.43 19.42 17.61 19.48 20.98 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Alessandria 5.34 5.37 4.65 4.63 4.85 5.81 5.34 6.66 10.48 11.80 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Ancona 5.11 4.14 4.05 3.49 3.78 5.82 4.94 6.84 9.20 11.27 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Arezzo 4.55 5.50 4.88 4.61 4.91 5.51 5.87 6.04 7.33 8.04 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 58 columns

spec = {
    "Territory": 1,                            # 1 Italy
    "Data type": 6,                            # (6:'unemployment rate')
    'Measure': 1,
    'Gender': 3,
    'Age class':0,                             # all classes
    'Highest level of education attained': 12, # 12:'total',
    'Citizenship': 3,                          # 3:'total')
    'Duration of unemployment': 3,             #  3:'total')
    'Time and frequency': 0                    # All
}

# convert istat dataset into jsonstat collection and print some info
collection_2 = istat_dataset_taxdisoccu.getvalues(spec)
collection_2
JsonstatCollection contains the following JsonStatDataSet:
posdataset
0'IDCLASETA28*IDTIME'
df = collection_2.dataset(0).to_table(rtype=pd.DataFrame, blocked_dims={'IDCLASETA28':'31'})
df.head(6)
Age class Time and frequency Value
0 15-74 years Q4-1992 NaN
1 15-74 years 1993 NaN
2 15-74 years Q1-1993 NaN
3 15-74 years Q2-1993 NaN
4 15-74 years Q3-1993 NaN
5 15-74 years Q4-1993 NaN
df = df.dropna()
df = df[df['Time and frequency'].str.contains(r'^Q.*')]
# df = df.set_index('Time and frequency')
df.head(6)
Age class Time and frequency Value
57 15-74 years Q1-2004 8.68
58 15-74 years Q2-2004 7.88
59 15-74 years Q3-2004 7.33
60 15-74 years Q4-2004 8.17
62 15-74 years Q1-2005 8.27
63 15-74 years Q2-2005 7.54
df.plot(x='Time and frequency',y='Value', figsize=(18,4))
<matplotlib.axes._subplots.AxesSubplot at 0x1184b1908>
_images/istat_unemployment_19_1.png
fig = plt.figure(figsize=(18,6))
ax = fig.add_subplot(111)
plt.grid(True)
df.plot(x='Time and frequency',y='Value', ax=ax, grid=True)
# kind='barh', , alpha=a, legend=False, color=customcmap,
# edgecolor='w', xlim=(0,max(df['population'])), title=ttl)
<matplotlib.axes._subplots.AxesSubplot at 0x11a898b70>
_images/istat_unemployment_20_1.png
# plt.figure(figsize=(7,4))
# plt.plot(df['Time and frequency'],df['Value'], lw=1.5, label='1st')
# plt.plot(y[:,1], lw=1.5, label='2st')
# plt.plot(y,'ro')
# plt.grid(True)
# plt.legend(loc=0)
# plt.axis('tight')
# plt.xlabel('index')
# plt.ylabel('value')
# plt.title('a simple plot')
# forza lavoro
istat_forzlv = istat.dataset('LAB', 'DCCV_FORZLV')

spec = {
    "Territory": 'Italy',
    "Data type": 'number of labour force 15 years and more (thousands)',                            #
    'Measure':   'absolute values',
    'Gender':    'total',
    'Age class': '15 years and over',
    'Highest level of education attained': 'total',
    'Citizenship': 'total',
    'Time and frequency': 0
}

df_forzlv = istat_forzlv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_forzlv = df_forzlv.dropna()
df_forzlv = df_forzlv[df_forzlv['Time and frequency'].str.contains(r'^Q.*')]
df_forzlv.tail(6)
Time and frequency Value
187 Q2-2014 25419.15
188 Q3-2014 25373.70
189 Q4-2014 25794.44
190 Q1-2015 25460.25
191 Q2-2015 25598.29
192 Q3-2015 25321.61
istat_inattiv = istat.dataset('LAB', 'DCCV_INATTIV')
# HTML(istat_inattiv.info_dimensions_as_html())
spec = {
    "Territory": 'Italy',
    "Data type": 'number of inactive persons',
    'Measure':   'absolute values',
    'Gender':    'total',
    'Age class': '15 years and over',
    'Highest level of education attained': 'total',
    'Time and frequency': 0
}

df_inattiv = istat_inattiv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_inattiv = df_inattiv.dropna()
df_inattiv = df_inattiv[df_inattiv['Time and frequency'].str.contains(r'^Q.*')]
df_inattiv.tail(6)
citizenship Labour status Inactivity reasons Main status Time and frequency Value
24756 total total total total Q2-2014 26594.57
24757 total total total total Q3-2014 26646.90
24758 total total total total Q4-2014 26257.15
24759 total total total total Q1-2015 26608.07
24760 total total total total Q2-2015 26487.67
24761 total total total total Q3-2015 26746.26

Notebook: using jsonstat.py to explore ISTAT data (unemployment)

This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is the Italian National Institute of Statistics. It publishs a rest api for browsing italian statistics. This api can return results in jsonstat format.

La forza lavoro e composta da occupati e disoccupati. La popolozione sopra i 15 anni e composta da forza lavoro ed inatttivi.

\[Popolozoine = ForzaLavoro + Inattivi\]
\[Forzalav = Occupati + Disoccupati\]
\[Inattivi = NonVoglioLavorare + Scoraggiati\]

Tasso disoccupazione = Disoccupati/Occupati

download dataset from Istat

from __future__ import print_function
import os
import pandas as pd
from IPython.core.display import HTML
import matplotlib.pyplot as plt
%matplotlib inline

import istat

# Next step is to set a cache dir where to store json files downloaded from Istat.
# Storing file on disk speeds up development, and assures consistent results over time.
# Eventually, you can delete donwloaded files to get a fresh copy.

cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached"))
istat.cache_dir(cache_dir)
istat.lang(0)  # set italian language
print("cache_dir is '{}'".format(istat.cache_dir()))
cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'
# List all datasets contained into area `LAB` (Labour)
istat.area('LAB').datasets()
codnamedim
DCCV_COMPLIndicatori complementari12
DCCV_DISOCCUPTDisoccupati10
DCCV_DISOCCUPTDEDisoccupati - dati destagionalizzati7
DCCV_DISOCCUPTMENSDisoccupati - dati mensili8
DCCV_FORZLVForze di lavoro8
DCCV_FORZLVDEForze di lavoro - dati destagionalizzati7
DCCV_FORZLVMENSForze lavoro - dati mensili8
DCCV_INATTIVInattivi11
DCCV_INATTIVDEInattivi - dati destagionalizzati7
DCCV_INATTIVMENSInattivi - dati mensili8
DCCV_NEETNEET (giovani non occupati e non in istruzione e formazione)10
DCCV_OCCUPATIMENSOccupati - dati mensili8
DCCV_OCCUPATIT Occupati 14
DCCV_OCCUPATITDEOccupati - dati destagionalizzati8
DCCV_ORELAVMEDOccupati per ore settimanali lavorate e numero di ore settimanali lavorate procapite12
DCCV_TAXATVTTasso di attività8
DCCV_TAXATVTDETasso di attività - dati destagionalizzati7
DCCV_TAXATVTMENSTasso di attività - dati mensili8
DCCV_TAXDISOCCUTasso di disoccupazione9
DCCV_TAXDISOCCUDETasso di disoccupazione - dati destagionalizzati7
DCCV_TAXDISOCCUMENSTasso di disoccupazione - dati mensili8
DCCV_TAXINATTTasso di inattività8
DCCV_TAXINATTDETasso di inattività - dati destagionalizzati7
DCCV_TAXINATTMENSTasso di inattività - dati mensili8
DCCV_TAXOCCUTasso di occupazione8
DCCV_TAXOCCUDETasso di occupazione - dati destagionalizzati7
DCCV_TAXOCCUMENSTasso di occupazione - dati mensili8
DCIS_RICSTATRicostruzione statistica delle serie regionali di popolazione del periodo 1/1/2002-1/1/20146
DCSC_COSTLAVSTRUT_1Struttura del costo del lavoro (indagine quadriennale)6
DCSC_COSTLAVULAOROS_1Indicatori del costo del lavoro per Ula - dati trimestrali5
DCSC_GI_COSCosto del lavoro nelle imprese con almeno 500 dipendenti - dati mensili6
DCSC_GI_OCCOccupazione dipendente, tassi di ingresso e uscita nelle imprese con almeno 500 dipendenti - dati mensili6
DCSC_GI_OREOre lavorate nelle imprese con almeno 500 dipendenti - dati mensili6
DCSC_GI_RERetribuzione lorda nelle imprese con almeno 500 dipendenti - dati mensili6
DCSC_ORE10_1Ore lavorate nelle imprese con almeno 10 dipendenti - dati trimestrali5
DCSC_OROS_1Indice delle posizioni lavorative alle dipendenze - dati trimestrali5
DCSC_POSTIVAC_1Tasso di posti vacanti - dati trimestrali5
DCSC_RETRATECO1Retribuzioni contrattuali per Ateco 20076
DCSC_RETRCASSCOMPPARetribuzione contrattuale di cassa e di competenza per dipendente della pubblica amministrazione per contratto - dati annuali - euro7
DCSC_RETRCONTR1CRetribuzioni contrattuali per contratto - dati mensili e annuali .6
DCSC_RETRCONTR1OOrario contrattuale annuo lordo, netto, ferie e altre ore di riduzione 6
DCSC_RETRCONTR1TIndicatori di tensione contrattuale - dati mensili e annuali6
DCSC_RETRULAOROS_1Indice delle retribuzioni lorde per Ula - dati trimestrali5

Download - numero occupati - numero disoccupati - forza lavoro - controllare che nroccupati + nrdisoccupati = forza lavoro

Download Occupati

# DCCV_OCCUPATIT
istat_occupatit = istat.dataset('LAB', 'DCCV_OCCUPATIT')
# HTML(istat_occupatit.info_dimensions_as_html(show_values=0))
spec = {
    'Territorio':              'Italia',
    'Sesso':                   'totale',
    'Classe di età':           '15 anni e più',
    'Titolo di studio':        'totale',
    'Cittadinanza':            'totale',
    'Ateco 2002' :             '0010 totale',
    'Ateco 2007' :             '0010 totale',
    'Posizione professionale': 'totale',
    'Profilo professionale':   'totale',
    'Professione 2001':        'totale',
    'Professione 2011':        'totale',
    'Regime orario':           'totale',
    'Carattere occupazione':   'totale',
    'Tempo e frequenza': 0
}

df_occupatit = istat_occupatit.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_occupatit[df_occupatit['Tempo e frequenza'].str.contains(r'^T.*')]
df_occupatit.tail(6)
Tempo e frequenza Value
187 T2-2014 22316.76
188 T3-2014 22398.30
189 T4-2014 22374.93
190 T1-2015 22158.45
191 T2-2015 22496.79
192 T3-2015 22645.07
df_occupatit.ix[192]
Tempo e frequenza    T3-2015
Value                22645.1
Name: 192, dtype: object

Download disoccupati

istat_disoccupt = istat.dataset('LAB', 'DCCV_DISOCCUPT')
istat_disoccupt
DCCV_DISOCCUPT(10):Disoccupati
nrnamenr. valuesvalues (first 3 values)
0Territorio1361:'Italia', 3:'Nord', 4:'Nord-ovest' ...
1Tipo dato12:'numero di persone in cerca di occupazione 15 anni e oltre (valori in migliaia)'
2Misura19:'valori assoluti'
3Sesso31:'maschi', 2:'femmine', 3:'totale' ...
4Classe di età1117:'45-54 anni', 4:'15-24 anni', 21:'55-64 anni' ...
5Titolo di studio511:'laurea e post-laurea', 12:'totale', 3:'licenza di scuola elementare, nessun titolo di studio' ...
6Cittadinanza31:'italiano-a', 2:'straniero-a', 3:'totale' ...
7Condizione professionale43:'disoccupati ex-occupati', 4:'disoccupati ex-inattivi', 5:'disoccupati senza esperienza di lavoro' ...
8Durata disoccupazione22:'12 mesi o più', 3:'totale'
9Tempo e frequenza1931536:'T4-1980', 2049:'T4-2007', 1540:'1981' ...
spec = {
    'Territorio':               'Italia',
    'Tipo dato' :               'numero di persone in cerca di occupazione 15 anni e oltre (valori in migliaia)',
    'Misura':                   'valori assoluti',
    'Sesso':                    'totale',
    'Classe di età':            '15 anni e più',
    'Titolo di studio':         'totale',
    'Cittadinanza':             'totale',
    'Condizione professionale': 'totale',
    'Durata disoccupazione':    'totale',
    'Tempo e frequenza': 0
}

df_disoccupt = istat_disoccupt.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_disoccupt[df_disoccupt['Tempo e frequenza'].str.contains(r'^T.*')]
df_disoccupt.tail(6)
Tempo e frequenza Value
187 T2-2014 3102.39
188 T3-2014 2975.40
189 T4-2014 3419.51
190 T1-2015 3301.81
191 T2-2015 3101.50
192 T3-2015 2676.55

Download forza Lavoro

istat_forzlv = istat.dataset('LAB', 'DCCV_FORZLV')
istat_forzlv
DCCV_FORZLV(8):Forze di lavoro
nrnamenr. valuesvalues (first 3 values)
0Territorio1361:'Italia', 3:'Nord', 4:'Nord-ovest' ...
1Tipo dato13:'numero di forze di lavoro15 anni e oltre (valori in migliaia)'
2Misura19:'valori assoluti'
3Sesso31:'maschi', 2:'femmine', 3:'totale' ...
4Classe di età1017:'45-54 anni', 4:'15-24 anni', 21:'55-64 anni' ...
5Titolo di studio511:'laurea e post-laurea', 12:'totale', 3:'licenza di scuola elementare, nessun titolo di studio' ...
6Cittadinanza31:'italiano-a', 2:'straniero-a', 3:'totale' ...
7Tempo e frequenza1931536:'T4-1980', 2049:'T4-2007', 1540:'1981' ...
spec = {
    'Territorio':       'Italia',
    'Tipo dato':        'numero di forze di lavoro15 anni e oltre (valori in migliaia)',
    'Misura':           'valori assoluti',
    'Sesso':            'totale',
    'Classe di età':    '15 anni e più',
    'Titolo di studio': 'totale',
    'Cittadinanza':     'totale',
    'Tempo e frequenza': 0
}

df_forzlv = istat_forzlv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
# df_forzlv
# df_forzlv = df_forzlv.dropna()
df_forzlv = df_forzlv[df_forzlv['Tempo e frequenza'].str.contains(r'^T.*')]
df_forzlv.tail(6)
Tempo e frequenza Value
187 T2-2014 25419.15
188 T3-2014 25373.70
189 T4-2014 25794.44
190 T1-2015 25460.25
191 T2-2015 25598.29
192 T3-2015 25321.61

Download inattivi

istat_inattiv = istat.dataset('LAB', 'DCCV_INATTIV')
istat.options.display.max_rows = 0
# HTML(istat_inattiv.info_dimensions_as_html(show_values=0))
istat_inattiv
DCCV_INATTIV(11):Inattivi
nrnamenr. valuesvalues (alls values)
0Territorio1361:'Italia', 3:'Nord', 4:'Nord-ovest', 5:'Piemonte', 6:'Torino', 7:'Vercelli', 8:'Biella', 9:'Verbano-Cusio-Ossola', 10:'Novara', 11:'Cuneo', 12:'Asti', 13:'Alessandria', 14:'Valle d'Aosta / Vallée d'Aoste', 15:'Valle d'Aosta / Vallée d'Aoste', 16:'Liguria', 17:'Imperia', 18:'Savona', 19:'Genova', 20:'La Spezia', 21:'Lombardia', 22:'Varese', 23:'Como', 24:'Lecco', 25:'Sondrio', 26:'Milano', 27:'Bergamo', 28:'Brescia', 29:'Pavia', 30:'Lodi', 31:'Cremona', 32:'Mantova', 33:'Nord-est', 34:'Trentino Alto Adige / Südtirol', 35:'Provincia Autonoma Bolzano / Bozen', 37:'Provincia Autonoma Trento', 39:'Veneto', 40:'Verona', 41:'Vicenza', 42:'Belluno', 43:'Treviso', 44:'Venezia', 45:'Padova', 46:'Rovigo', 47:'Friuli-Venezia Giulia', 48:'Pordenone', 49:'Udine', 50:'Gorizia', 51:'Trieste', 52:'Emilia-Romagna', 53:'Piacenza', 54:'Parma', 55:'Reggio nell'Emilia', 56:'Modena', 57:'Bologna', 58:'Ferrara', 59:'Ravenna', 60:'Forlì-Cesena', 61:'Rimini', 62:'Centro', 63:'Toscana', 64:'Massa-Carrara', 65:'Lucca', 66:'Pistoia', 67:'Firenze', 68:'Prato', 69:'Livorno', 70:'Pisa', 71:'Arezzo', 72:'Siena', 73:'Grosseto', 74:'Umbria', 75:'Perugia', 76:'Terni', 77:'Marche', 78:'Pesaro e Urbino', 79:'Ancona', 80:'Macerata', 81:'Ascoli Piceno', 82:'Lazio', 83:'Viterbo', 84:'Rieti', 85:'Roma', 86:'Latina', 87:'Frosinone', 88:'Mezzogiorno', 90:'Abruzzo', 91:'L'Aquila', 92:'Teramo', 93:'Pescara', 94:'Chieti', 95:'Molise', 96:'Isernia', 97:'Campobasso', 98:'Campania', 99:'Caserta', 100:'Benevento', 101:'Napoli', 102:'Avellino', 103:'Salerno', 104:'Puglia', 105:'Foggia', 106:'Bari', 107:'Taranto', 108:'Brindisi', 109:'Lecce', 110:'Basilicata', 111:'Potenza', 112:'Matera', 113:'Calabria', 114:'Cosenza', 115:'Crotone', 116:'Catanzaro', 117:'Vibo Valentia', 118:'Reggio di Calabria', 120:'Sicilia', 121:'Trapani', 122:'Palermo', 123:'Messina', 124:'Agrigento', 125:'Caltanissetta', 126:'Enna', 127:'Catania', 128:'Ragusa', 129:'Siracusa', 130:'Sardegna', 131:'Sassari', 132:'Nuoro', 133:'Cagliari', 134:'Oristano', 135:'Olbia-Tempio', 136:'Ogliastra', 137:'Medio Campidano', 138:'Carbonia-Iglesias', 146:'Monza e della Brianza', 147:'Fermo', 148:'Barletta-Andria-Trani'
1Tipo dato23:'numero di forze di lavoro15 anni e oltre (valori in migliaia)', 4:'numero di inattivi (valori in migliaia)'
2Misura19:'valori assoluti'
3Sesso31:'maschi', 2:'femmine', 3:'totale'
4Classe di età121:'0-14 anni', 4:'15-24 anni', 7:'15-34 anni', 8:'25-34 anni', 10:'35-64 anni', 14:'35-44 anni', 17:'45-54 anni', 21:'55-64 anni', 22:'15-64 anni', 25:'65 anni e più', 28:'15 anni e più', 29:'totale'
5Titolo di studio511:'laurea e post-laurea', 12:'totale', 3:'licenza di scuola elementare, nessun titolo di studio', 4:'licenza di scuola media', 7:'diploma'
6Cittadinanza31:'italiano-a', 2:'straniero-a', 3:'totale'
7Condizione professionale96:'inattivi in età lavorativa', 7:'cercano lavoro non attivamente', 8:'cercano lavoro ma non disponibili a lavorare', 9:'non cercano ma disponibili a lavorare', 10:'non cercano e non disponibili a lavorare', 11:'inattivi in età non lavorativa', 12:'non forze di lavoro fino a 14 anni', 13:'non forze di lavoro di 65 anni e più', 14:'totale'
8Motivo inattività71:'scoraggiamento', 2:'motivi familiari', 3:'studio, formazione professionale', 4:'aspetta esiti passate azioni di ricerca', 5:'pensione, non interessa anche per motivi di età', 6:'altri motivi', 7:'totale'
9Condizione dichiarata81:'occupato', 6:'disoccupato alla ricerca di nuova occupazione', 7:'in cerca di prima occupazione', 8:'casalinga-o', 9:'studente', 10:'ritirato-a dal lavoro', 11:'in altra condizione', 12:'totale'
10Tempo e frequenza1931536:'T4-1980', 2049:'T4-2007', 1540:'1981', 2053:'2008', 1542:'T1-1981', 2055:'T1-2008', 1546:'T2-1981', 2059:'T2-2008', 1551:'T3-1981', 2064:'T3-2008', 1555:'T4-1981', 2068:'T4-2008', 1559:'1982', 2072:'2009', 1561:'T1-1982', 2074:'T1-2009', 1565:'T2-1982', 2078:'T2-2009', 1570:'T3-1982', 2083:'T3-2009', 1574:'T4-1982', 2087:'T4-2009', 1578:'1983', 2091:'2010', 1580:'T1-1983', 2093:'T1-2010', 1584:'T2-1983', 2097:'T2-2010', 1589:'T3-1983', 2102:'T3-2010', 1593:'T4-1983', 2106:'T4-2010', 1597:'1984', 2110:'2011', 1599:'T1-1984', 2112:'T1-2011', 1603:'T2-1984', 2116:'T2-2011', 1608:'T3-1984', 2121:'T3-2011', 1612:'T4-1984', 2125:'T4-2011', 1616:'1985', 2129:'2012', 1618:'T1-1985', 2131:'T1-2012', 1622:'T2-1985', 2135:'T2-2012', 1627:'T3-1985', 2140:'T3-2012', 1631:'T4-1985', 2144:'T4-2012', 1635:'1986', 2148:'2013', 1637:'T1-1986', 2150:'T1-2013', 1641:'T2-1986', 2154:'T2-2013', 1646:'T3-1986', 2159:'T3-2013', 1650:'T4-1986', 2163:'T4-2013', 1654:'1987', 2167:'2014', 1656:'T1-1987', 2169:'T1-2014', 1660:'T2-1987', 2173:'T2-2014', 1665:'T3-1987', 2178:'T3-2014', 1669:'T4-1987', 2182:'T4-2014', 1673:'1988', 1675:'T1-1988', 2188:'T1-2015', 1679:'T2-1988', 2192:'T2-2015', 1684:'T3-1988', 2197:'T3-2015', 1688:'T4-1988', 1692:'1989', 1694:'T1-1989', 1698:'T2-1989', 1703:'T3-1989', 1707:'T4-1989', 1711:'1990', 1713:'T1-1990', 1717:'T2-1990', 1722:'T3-1990', 1726:'T4-1990', 1730:'1991', 1732:'T1-1991', 1736:'T2-1991', 1741:'T3-1991', 1745:'T4-1991', 1749:'1992', 1751:'T1-1992', 1755:'T2-1992', 1760:'T3-1992', 1764:'T4-1992', 1768:'1993', 1770:'T1-1993', 1774:'T2-1993', 1779:'T3-1993', 1783:'T4-1993', 1787:'1994', 1789:'T1-1994', 1793:'T2-1994', 1798:'T3-1994', 1802:'T4-1994', 1806:'1995', 1808:'T1-1995', 1812:'T2-1995', 1817:'T3-1995', 1821:'T4-1995', 1825:'1996', 1827:'T1-1996', 1831:'T2-1996', 1836:'T3-1996', 1840:'T4-1996', 1844:'1997', 1846:'T1-1997', 1850:'T2-1997', 1855:'T3-1997', 1859:'T4-1997', 1863:'1998', 1865:'T1-1998', 1869:'T2-1998', 1874:'T3-1998', 1878:'T4-1998', 1882:'1999', 1884:'T1-1999', 1888:'T2-1999', 1893:'T3-1999', 1897:'T4-1999', 1901:'2000', 1903:'T1-2000', 1907:'T2-2000', 1912:'T3-2000', 1916:'T4-2000', 1920:'2001', 1922:'T1-2001', 1926:'T2-2001', 1931:'T3-2001', 1935:'T4-2001', 1939:'2002', 1941:'T1-2002', 1945:'T2-2002', 1950:'T3-2002', 1954:'T4-2002', 1958:'2003', 1960:'T1-2003', 1964:'T2-2003', 1969:'T3-2003', 1973:'T4-2003', 1464:'1977', 1977:'2004', 1466:'T1-1977', 1979:'T1-2004', 1470:'T2-1977', 1983:'T2-2004', 1475:'T3-1977', 1988:'T3-2004', 1479:'T4-1977', 1992:'T4-2004', 1483:'1978', 1996:'2005', 1485:'T1-1978', 1998:'T1-2005', 1489:'T2-1978', 2002:'T2-2005', 1494:'T3-1978', 2007:'T3-2005', 1498:'T4-1978', 2011:'T4-2005', 1502:'1979', 2015:'2006', 1504:'T1-1979', 2017:'T1-2006', 1508:'T2-1979', 2021:'T2-2006', 1513:'T3-1979', 2026:'T3-2006', 1517:'T4-1979', 2030:'T4-2006', 1521:'1980', 2034:'2007', 1523:'T1-1980', 2036:'T1-2007', 1527:'T2-1980', 2040:'T2-2007', 1532:'T3-1980', 2045:'T3-2007'
spec = {
    'Territorio':        'Italia',
    'Tipo dato':         'numero di inattivi (valori in migliaia)',
    'Misura':            'valori assoluti',
    'Sesso':             'totale',
    'Classe di età':     '15 anni e più',
    'Titolo di studio':  'totale',
    'Cittadinanza' : 'totale',
    'Condizione professionale': 'totale',
    'Motivo inattività': 'totale',
    'Condizione dichiarata': 'totale',
    'Tempo e frequenza': 0
}

df_inattiv = istat_inattiv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
# df_inattiv
df_inattiv = df_inattiv[df_inattiv['Tempo e frequenza'].str.contains(r'^T.*')]
df_inattiv.tail(6)
Tempo e frequenza Value
187 T2-2014 26594.57
188 T3-2014 26646.90
189 T4-2014 26257.15
190 T1-2015 26608.07
191 T2-2015 26487.67
192 T3-2015 26746.26

Tutorial

The parrot module is a module about parrots.

Doctest example:

>>> 2 + 2
4

Test-Output example:

json_string = '''
{
    "label" : "concepts",
    "category" : {
        "index" : {
            "POP" : 0,
            "PERCENT" : 1
        },
        "label" : {
            "POP" : "population",
            "PERCENT" : "weight of age group in the population"
        },
        "unit" : {
            "POP" : {
                "label": "thousands of persons",
                "decimals": 1,
                "type" : "count",
                "base" : "people",
                "multiplier" : 3
            },
            "PERCENT" : {
                "label" : "%",
                "decimals": 1,
                "type" : "ratio",
                "base" : "per cent",
                "multiplier" : 0
            }
        }
    }
}
'''
print(2 + 2)

This would output:

4

Api Reference

Jsonstat Module

jsonstat module contains classes and utility functions to parse jsonstat data format.

Utility functions

jsonstat.from_file(filename)[source]

read a file containing a jsonstat format and return the appropriate object

Parameters:filename – file containing a jsonstat
Returns:a JsonStatCollection, JsonStatDataset or JsonStatDimension object

example

>>> import os, jsonstat
>>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json")
>>> o = jsonstat.from_file(filename)
>>> type(o)
<class 'jsonstat.collection.JsonStatCollection'>
jsonstat.from_url(url, pathname=None)[source]

download an url and return the downloaded content.

see jsonstat.download() for how to use pathname parameter.

Parameters:

will be stored into the file <cache_dir>/pathname If pathname is None the filename will be automatic generated. If pathname is an absolute path cache_dir will be ignored.

Returns:the contents of url

To set dir where to store downloaded file see jsonstat.cache_dir(). Cache expiration policy can be customized

example:

>>> import jsonstat
>>> # cache_dir = os.path.normpath(os.path.join(jsonstat.__fixtures_dir, "json-stat.org"))
>>> # download external content into the /tmp dir so next downloads can be faster
>>> uri = 'http://json-stat.org/samples/oecd-canada.json'
>>> jsonstat.cache_dir("/tmp")
'/tmp'
>>> o = jsonstat.from_url(uri)
>>> print(o)
JsonstatCollection contains the following JsonStatDataSet:
+-----+----------+
| pos | dataset  |
+-----+----------+
| 0   | 'oecd'   |
| 1   | 'canada' |
+-----+----------+
jsonstat.from_json(json_data)[source]

transform a json structure into jsonstat objects hierarchy

Parameters:json_data – data structure (dictionary) representing a json
Returns:a JsonStatCollection, JsonStatDataset or JsonStatDimension object
>>> import json, jsonstat
>>> from collections import OrderedDict
>>> json_string_v1 = '''{
...                       "oecd" : {
...                         "value": [1],
...                         "dimension" : {
...                           "id": ["one"],
...                           "size": [1],
...                           "one": { "category": { "index":{"2010":0 } } }
...                         }
...                       }
...                     }'''
>>> json_data = json.loads(json_string_v1, object_pairs_hook=OrderedDict)
>>> jsonstat.from_json(json_data)
JsonstatCollection contains the following JsonStatDataSet:
+-----+---------+
| pos | dataset |
+-----+---------+
| 0   | 'oecd'  |
+-----+---------+
jsonstat.from_string(json_string)[source]

parse a jsonstat string and return the appropriate object

Parameters:json_string – string containing a json
Returns:a JsonStatCollection, JsonStatDataset or JsonStatDimension object
jsonstat.cache_dir(cached_dir=u'', time_to_live=None)[source]

Manage the directory cached_dir where downloaded files are stored

without parameter return the cached_dir directory with a parameters set the directory

Parameters:
  • cached_dir
  • time_to_live
jsonstat.download(url, pathname=None)[source]

download a url and return the downloaded content``

Parameters:

will be stored into the file <cache_dir>/pathname If pathname is None the filename will be automatic generated. If pathname is an absolute path cache_dir will be ignored.

Returns:the contents of url

To set dir where to store downloaded file see jsonstat.cache_dir(). Cache expiration policy can be customized

JsonStatCollection

class jsonstat.JsonStatCollection[source]

Represents a jsonstat collection.

It contains one or more datasets.

>>> import os, jsonstat  
>>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json")
>>> collection = jsonstat.from_file(filename)
>>> len(collection)
2
>>> collection
JsonstatCollection contains the following JsonStatDataSet:
+-----+-----------------------------------------------------+
| pos | dataset                                             |
+-----+-----------------------------------------------------+
| 0   | 'Unemployment rate in the OECD countries 2003-2014' |
| 1   | 'Population by sex and age group. Canada. 2012'     |
+-----+-----------------------------------------------------+
__len__()[source]

the number of dataset contained in this collection

dataset(spec)[source]

select a dataset belonging to the collection

Parameters:spec

can be

  • the name of collection (string) for jsonstat v1
  • an integer (for jsonstat v1 and v2)
Returns:a JsonStatDataSet object
parsing
JsonStatCollection.from_file()[source]

initialize this collection from a file It is better to use jsonstat.from_file()

Parameters:filename – name containing a jsonstat
Returns:itself to chain call
JsonStatCollection.from_string()[source]

Initialize this collection from a string It is better to use jsonstat.from_string()

Parameters:json_string – string containing a json
Returns:itself to chain call
JsonStatCollection.from_json()[source]

initialize this collection from a json structure It is better to use jsonstat.from_json()

Parameters:json_data – data structure (dictionary) representing a json
Returns:itself to chain call

JsonStatDataSet

class jsonstat.JsonStatDataSet(name=None)[source]

Represents a JsonStat dataset

>>> import os, jsonstat  
>>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json")
>>> dataset = jsonstat.from_file(filename).dataset(0)
>>> dataset.label
'Unemployment rate in the OECD countries 2003-2014'
>>> print(dataset)
name:   'Unemployment rate in the OECD countries 2003-2014'
label:  'Unemployment rate in the OECD countries 2003-2014'
size: 432
+-----+---------+--------------------------------+------+--------+
| pos | id      | label                          | size | role   |
+-----+---------+--------------------------------+------+--------+
| 0   | concept | indicator                      | 1    | metric |
| 1   | area    | OECD countries, EU15 and total | 36   | geo    |
| 2   | year    | 2003-2014                      | 12   | time   |
+-----+---------+--------------------------------+------+--------+
>>> dataset.dimension(1)
+-----+--------+----------------------------+
| pos | idx    | label                      |
+-----+--------+----------------------------+
| 0   | 'AU'   | 'Australia'                |
| 1   | 'AT'   | 'Austria'                  |
| 2   | 'BE'   | 'Belgium'                  |
| 3   | 'CA'   | 'Canada'                   |
| 4   | 'CL'   | 'Chile'                    |
| 5   | 'CZ'   | 'Czech Republic'           |
| 6   | 'DK'   | 'Denmark'                  |
| 7   | 'EE'   | 'Estonia'                  |
| 8   | 'FI'   | 'Finland'                  |
| 9   | 'FR'   | 'France'                   |
| 10  | 'DE'   | 'Germany'                  |
| 11  | 'GR'   | 'Greece'                   |
| 12  | 'HU'   | 'Hungary'                  |
| 13  | 'IS'   | 'Iceland'                  |
| 14  | 'IE'   | 'Ireland'                  |
| 15  | 'IL'   | 'Israel'                   |
| 16  | 'IT'   | 'Italy'                    |
| 17  | 'JP'   | 'Japan'                    |
| 18  | 'KR'   | 'Korea'                    |
| 19  | 'LU'   | 'Luxembourg'               |
| 20  | 'MX'   | 'Mexico'                   |
| 21  | 'NL'   | 'Netherlands'              |
| 22  | 'NZ'   | 'New Zealand'              |
| 23  | 'NO'   | 'Norway'                   |
| 24  | 'PL'   | 'Poland'                   |
| 25  | 'PT'   | 'Portugal'                 |
| 26  | 'SK'   | 'Slovak Republic'          |
| 27  | 'SI'   | 'Slovenia'                 |
| 28  | 'ES'   | 'Spain'                    |
| 29  | 'SE'   | 'Sweden'                   |
| 30  | 'CH'   | 'Switzerland'              |
| 31  | 'TR'   | 'Turkey'                   |
| 32  | 'UK'   | 'United Kingdom'           |
| 33  | 'US'   | 'United States'            |
| 34  | 'EU15' | 'Euro area (15 countries)' |
| 35  | 'OECD' | 'total'                    |
+-----+--------+----------------------------+
>>> dataset.data(0)
JsonStatValue(idx=0, value=5.943826289, status=None)
__init__(name=None)[source]

Initialize an empty dataset.

Dataset could have a name (key) if we parse a jsonstat format version 1.

Parameters:name – dataset name (for jsonstat v.1)
name()
Getter:returns the name of the dataset
Type:string
label()
Getter:returns the label of the dataset
Type:string
__len__()[source]

returns the size of the dataset

dimensions
JsonStatDataSet.dimension(spec)[source]

get a JsonStatDimension by spec

Parameters:spec – spec can be: - (string) or id of the dimension - int position of dimension
Returns:a JsonStatDimension
JsonStatDataSet.dimensions()[source]

returns list of JsonStatDimension

JsonStatDataSet.info_dimensions()[source]

print same info on dimensions on stdout

querying methods
JsonStatDataSet.data(*args, **kargs)[source]

Returns a JsonStatValue containings value and status about a datapoint The datapoint will be retrieved according the parameters

Parameters:
  • args
    • data(<int>) where i is index into the
    • data(<list>) where lst = [i1,i2,i3,...]) each i indicate the dimension len(lst) == number of dimension
    • data(<dict>) where dict is {k1:v1, k2:v2, ...} dimension of size 1 can be ommitted
  • kargs
    • data(k1=v1,k2=v2,...) where ki are the id or label of dimension vi are the index or label of the category dimension of size 1 can be ommitted
Returns:

a JsonStatValue object

kargs { cat1:value1, ..., cati:valuei, ... } cati can be the id of the dimension or the label of dimension valuei can be the index or label of category ex.:{country:”AU”, “year”:”2014”}

>>> import os, jsonstat  
>>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json")
>>> dataset = jsonstat.from_file(filename).dataset(0)
>>> dataset.data(0)
JsonStatValue(idx=0, value=5.943826289, status=None)
>>> dataset.data(concept='UNR', area='AU', year='2003')
JsonStatValue(idx=0, value=5.943826289, status=None)
>>> dataset.data(area='AU', year='2003')
JsonStatValue(idx=0, value=5.943826289, status=None)
>>> dataset.data({'area':'AU', 'year':'2003'})
JsonStatValue(idx=0, value=5.943826289, status=None)
JsonStatDataSet.value(*args, **kargs)[source]

get a value For the parameters see py:meth:jsonstat.JsonStatDataSet.data.

Returns:value (typically a number)
JsonStatDataSet.status(*args, **kargs)[source]

get datapoint status

For the parameters see py:meth:jsonstat.JsonStatDataSet.data.

Returns:status (typically a string)
transforming
JsonStatDataSet.to_table(content=u'label', order=None, rtype=<type 'list'>, blocked_dims={}, value_column=u'Value', without_one_dimensions=False)[source]

Transforms a dataset into a table (a list of row)

table len is the size of dataset + 1 for headers

Parameters:
  • content – can be “label” or “id”
  • order
  • rtype
  • blocked_dims
Returns:

a list of row, first line is the header

JsonStatDataSet.to_data_frame(index=None, content=u'label', order=None, blocked_dims={}, value_column=u'Value')[source]

Transform dataset to pandas data frame

extract_bidimensional(“year”, “country”) generate the following dataframe: year | country 2010 | 1 2011 | 2 2012 | 3

Parameters:
  • index
  • content
  • blocked_dims
  • order
  • value_column
Returns:

parsing
JsonStatDataSet.from_file(filename)[source]

read a jsonstat from a file and parse it to initialize this dataset.

It is better to use jsonstat.from_file()

Parameters:filename – path of the file.
Returns:itself to chain calls
JsonStatDataSet.from_string(json_string)[source]

parse a string containing a jsonstat and initialize this dataset

It is better to use jsonstat.from_string()

Parameters:json_string – string containing a jsonstat
Returns:itself to chain calls
JsonStatDataSet.from_json(json_data)[source]

parse a json structure and initialize this dataset

It is better to use py:meth:jsonstat.from_json

Parameters:json_data – json structure
Returns:itself to chain calls

JsonStatDimension

class jsonstat.JsonStatDimension(did=None, size=None, pos=None, role=None)[source]

Represents a JsonStat Dimension. It is contained into a JsonStat Dataset.

>>> from jsonstat import JsonStatDimension
>>> json_string = '''{
...                    "label" : "concepts",
...                    "category" : {
...                       "index" : { "POP" : 0, "PERCENT" : 1 },
...                       "label" : { "POP" : "population",
...                                   "PERCENT" : "weight of age group in the population" }
...                    }
...                  }
... '''
>>> dim = JsonStatDimension(did="concept", role="metric").from_string(json_string)
>>> len(dim)
2
>>> dim.category(0).index
'POP'
>>> dim.category('POP').label
'population'
>>> dim.category(1)
JsonStatCategory(label='weight of age group in the population', index='PERCENT', pos=1)
>>> print(dim)
+-----+-----------+-----------------------------------------+
| pos | idx       | label                                   |
+-----+-----------+-----------------------------------------+
| 0   | 'POP'     | 'population'                            |
| 1   | 'PERCENT' | 'weight of age group in the population' |
+-----+-----------+-----------------------------------------+
>>> json_string_dimension_sex = '''
... {
...     "label" : "sex",
...     "category" : {
...       "index" : {
...         "M" : 0,
...         "F" : 1,
...         "T" : 2
...       },
...       "label" : {
...         "M" : "men",
...         "F" : "women",
...         "T" : "total"
...       }
...     }
... }
... '''
>>> dim = JsonStatDimension(did="sex").from_string(json_string_dimension_sex)
>>> len(dim)
3
__init__(did=None, size=None, pos=None, role=None)[source]

initialize a dimension

Warning

this is an internal library function (it is not public api)

Parameters:
  • did – id of dimension
  • size – size of dimension (nr of values)
  • pos – position of dimension into the dataset
  • role – of dimension
did()

id of this dimension

label()

label of this dimension

role()

role of this dimension (can be time, geo or metric)

pos()

position of this dimension with respect to the data set to which this dimension belongs

__len__()[source]

size of this dimension

querying methods
JsonStatDimension.category(spec)[source]

return JsonStatCategory according to spec

Parameters:spec – can be index (string) or label (string) or a position (integer)
Returns:a JsonStatCategory
parsing methods
JsonStatDimension.from_string(json_string)[source]

parse a json string

Parameters:json_string
Returns:itself to chain calls
JsonStatDimension.from_json(json_data)[source]

Parse a json structure representing a dimension

From json-stat.org

It is used to describe a particular dimension. The name of this object must be one of the strings in the id array. There must be one and only one dimension ID object for every dimension in the id array.

jsonschema for dimension is about:

"dimension": {
    "type": "object",
    "properties": {
        "version": {"$ref": "#/definitions/version"},
        "href": {"$ref": "#/definitions/href"},
        "class": {"type": "string", "enum": ["dimension"]},
        "label": {"type": "string"},
        "category": {"$ref": "#/definitions/category"},
        "note": {"type": "array"},
    },
    "additionalProperties": false
},
Parameters:json_data
Returns:itself to chain call

Downloader helper

class jsonstat.Downloader(cache_dir=u'./data', time_to_live=None)[source]

Helper class to download json stat files.

It has a very simple cache mechanism

cache_dir()[source]
download(url, filename=None, time_to_live=None)[source]

Download url from internet.

Store the downloaded content into <cache_dir>/file. If <cache_dir>/file exists, it returns content from disk

Parameters:
  • url – page to be downloaded
  • filename – filename where to store the content of url, None if we want not store
  • time_to_live – how many seconds to store file on disk, None use default time_to_live, 0 don’t use cached version if any
Returns:

the content of url (str type)

collection := {
                [ "version" ":" `string` ]
                [ "class" ":" "collection" ]
                [ "href" ":" `url` ]
                [ "updated": `date` ]
                link : {
                    item : [
                        ( dataset )+
                    ]
             }
dataset := {
       "version"   : <version>
       "class"     : "dataset",
       "href"      : <url>
       "label"     : <string>
       "id"        : [ <string>+]           # ex. "id" : ["metric", "time", "geo", "sex"],
       "size"      : [ <int>, <int>, ... ]
       "role"      : roles of dimension
       "value"     : [<int>, <int> ]
       "status"    : status
       "dimension" : { <dimension_id> : dimension, ...}
       "link"      :
       }

dimension_id := <string>

# possible values of dimension are called categories
dimension := {
       "label" : <string>
       "class" : "dimension"
       "category: {
                 "index"       : dimension_index
                 "label"       : dimension_label
                 "child"       : dimension_child
                 "coordinates" :
                 "unit"        : dimension_unit
                  }
       }

dimension_index :=
                   { <cat1>:int, ....}      # { "2003" : 0, "2004" : 1, "2005" : 2, "2006" : 3 }
                |
                   [ <cat1>, <cat2> ]   # [  2003, 2004 ]
dimension_label :=
{ lbl1:idx1

Istat Module

This module contains helper class useful exploring the Italian Statistics Institute.

Utility Function

istat.cache_dir(cache_dir=None, time_to_live=None)[source]

Manage the directory cached_dir where to store downloaded files

without parameter get the directory with a parameter set the directory :param time_to_live: :param cache_dir:

istat.areas()[source]

returns a list of IstatArea objects representing all the area used to classify datasets

istat.area(spec)[source]

returns a IstatArea object conforming to spec. :param spec: name of istat area

istat.dataset(spec_area, spec_dataset)[source]

returns the IstatDataset identified by spec_dataset` (name of the dataset) contained into the IstatArea identified by `spec_area` (name of the area) :param spec_area: name of istat area :param spec_dataset: name of istat dataset

IstatArea

class istat.IstatArea(istat_helper, iid, cod, desc)[source]

Represents a Area. An Area contains a list of dataset. Instances of this class are build only by Istat class

cod

returns name of the area

dataset(spec)[source]

get a instance of IstatDataset by spec :param spec: code of the dataset :return: IstatDataset instance

datasets()[source]

Returns a list of IstatDataset

desc

returns name of the area

iid

returns the id of the area

info()[source]

print some info about the area

IstatDataset

class istat.IstatDataset(istat_helper, dataset)[source]
cod

returns the code of this dataset

dimension(spec)[source]

Get dimension according to spec

Parameters:spec – can be a int or a string
Returns:an IstatDimension instance
dimensions()[source]

Get list of dimensions

Returns:list of IstatDimension
getvalues(spec, rtype=<class jsonstat.collection.JsonStatCollection>)[source]

get values by dimensions

Parameters:
  • spec – it is a string for ex. “1,6,9,0,0”
  • type
Returns:

if type is JsonStatCollection return an istance of JsonStatCollection otherwise return a json structure representing the istat dataset

info_dimensions()[source]

print info about dimensions of this dataset

name

returns the name of this dataset

nrdim()[source]

returns the number of dimensions

IstatDimension

class istat.IstatDimension(name, pos, json_data)[source]

Represents a IstatDimension (it is different from JsonStat Dimension

cod2desc(spec)[source]

returns the description corresponding to the cod

desc2cod(spec)[source]

returns the code corresponding to the description

name

the name of the istat dimension

pos

position into the dataset

jsonstat.py

https://travis-ci.org/26fe/jsonstat.py.svg?branch=master Documentation Status https://badge.fury.io/py/jsonstat.py.png https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg

jsonstat.py is a library for reading the JSON-stat data format maintained and promoted by Xavier Badosa. The JSON-stat format is a JSON format for publishing dataset. JSON-stat is used by several institutions to publish statistical data. An incomplete list is:

jsonstat.py library tries to mimic as much is possible in python the json-stat Javascript Toolkit. One of the library objectives is to be helpful in exploring dataset using jupyter (ipython) notebooks.

For a fast overview of the feature you can start from this example notebook oecd-canada-jsonstat_v1.html You can also check out some of the jupyter example notebook from the example directory on github or from the documentation

As bonus jsonstat.py contains an useful classes to explore dataset published by Istat.

You can find useful another python library pyjstat by Miguel Expósito Martín concerning json-stat format.

This library is in beta status. I am actively working on it and hope to improve this project. For every comment feel free to contact me gf@26fe.com

You can find source at github , where you can open a ticket, if you wish.

You can find the generated documentation at readthedocs.

Installation

Pip will install all required dependencies. For installation:

pip install jsonstat.py

Usage

Simple Usage

There is a simple command line interface, so you can experiment to parse jsonstat file without write code:

# parsing collection
$ jsonstat info --cache_dir /tmp http://json-stat.org/samples/oecd-canada.json
downloaded file(s) are stored into '/tmp'
download 'http://json-stat.org/samples/oecd-canada.json'
Jsonsta    tCollection contains the following JsonStatDataSet:
+-----+----------+
| pos | dataset  |
+-----+----------+
| 0   | 'oecd'   |
| 1   | 'canada' |
+-----+----------+

# parsing dataset
$ jsonstat info --cache_dir /tmp  "http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?sex=T&precision=1&age=TOTAL&s_adj=NSA"
downloaded file(s) are stored into '/tmp'
download 'http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?sex=T&precision=1&age=TOTAL&s_adj=NSA'
name:   'Unemployment rate'
label:  'Unemployment rate'
size: 467
+-----+-------+-------+------+------+
| pos | id    | label | size | role |
+-----+-------+-------+------+------+
| 0   | s_adj | s_adj | 1    |      |
| 1   | age   | age   | 1    |      |
| 2   | sex   | sex   | 1    |      |
| 3   | geo   | geo   | 39   |      |
| 4   | time  | time  | 12   |      |
+-----+-------+-------+------+------+

code example:

url = 'http://json-stat.org/samples/oecd-canada.json'
collection = jsonstat.from_url(url)

# print list of dataset contained into the collection
print(collection)

# select the first dataset of the collection and print a short description
oecd = collection.dataset(0)
print(oecd)

# print description about each dimension of the dataset
for d in oecd.dimensions():
    print(d)

# print a datapoint contained into the dataset
print(oecd.value(area='IT', year='2012'))

# convert a dataset in pandas dataframe
df = oecd.to_data_frame('year')

For more python script examples see examples directory.

For jupyter (ipython) notebooks see examples-notebooks directory.

Support

This is an open source project, maintained in my spare time. Maybe a particular features or functions that you would like are missing. But things don’t have to stay that way: you can contribute the project development yourself. Or notify me and ask to implement it.

Bug reports and feature requests should be submitted using the github issue tracker. Please provide a full traceback of any error you see and if possible a sample file. If you are unable to make a file publicly available then contact me at gf@26fe.com.

You can find support also on the google group.

How to Contribute Code

Any help will be greatly appreciated, just follow those steps:

  1. Fork it. Start a new fork for each independent feature, don’t try to fix all problems at the same time, it’s easier for those who will review and merge your changes.
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Write your code. Add unit tests for your changes! If you added a whole new feature, or just improved something, you can be proud of it, so add yourself to the AUTHORS file :-) Update the docs!
  4. Commit your changes (git commit -am 'Added some feature')
  5. Push to the branch (git push origin my-new-feature)
  6. Create new Pull Request. Click on the large “pull request” button on your repository. Wait for your code to be reviewed, and, if you followed all theses steps, merged into the main repository.

License

jsonstat.py is provided under the LGPL license. See LICENSE file.

Indices and tables