jsonstat.py¶
jsonstat.py is a library for reading the JSON-stat data format maintained and promoted by Xavier Badosa. The JSON-stat format is a JSON format for publishing dataset. JSON-stat is used by several institutions to publish statistical data.
Contents:
Notebooks¶
Notebook: using jsonstat.py python library with jsonstat format version 1.¶
This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site. This example shows how to explore the example data file oecd-canada from json-stat.org site. This file is compliant to the version 1 of jsonstat.
# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat # import jsonstat.py package
import matplotlib as plt # for plotting
%matplotlib inline
Download or use cached file oecd-canada.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.
url = 'http://json-stat.org/samples/oecd-canada.json'
file_name = "oecd-canada.json"
file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
print("using already downloaded file {}".format(file_path))
else:
print("download file and storing on disk")
jsonstat.download(url, file_name)
file_path = file_name
using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada.json
Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.
collection = jsonstat.from_file(file_path)
collection
pos | dataset |
0 | 'oecd' |
1 | 'canada' |
Select the dataset named oedc
. Oecd dataset has three dimensions
(concept, area, year), and contains 432 values.
oecd = collection.dataset('oecd')
oecd
pos | id | label | size | role |
0 | concept | indicator | 1 | metric |
1 | area | OECD countries, EU15 and total | 36 | geo |
2 | year | 2003-2014 | 12 | time |
Shows some detailed info about dimensions
oecd.dimension('concept')
pos | idx | label |
0 | 'UNR' | 'unemployment rate' |
oecd.dimension('area')
pos | idx | label |
0 | 'AU' | 'Australia' |
1 | 'AT' | 'Austria' |
2 | 'BE' | 'Belgium' |
3 | 'CA' | 'Canada' | ... | ... | ... |
oecd.dimension('year')
pos | idx | label |
0 | '2003' | '' |
1 | '2004' | '' |
2 | '2005' | '' |
3 | '2006' | '' | ... | ... | ... |
Accessing value in the dataset¶
Print the value in oecd dataset for area = IT and year = 2012
oecd.data(area='IT', year='2012')
JsonStatValue(idx=201, value=10.55546863, status=None)
oecd.value(area='IT', year='2012')
10.55546863
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128
5.39663128
oecd.value(concept='UNR',area='AU',year='2004')
5.39663128
Trasforming dataset into pandas DataFrame¶
df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()
concept | area | Value | |
---|---|---|---|
year | |||
2003 | UNR | AU | 5.943826 |
2004 | UNR | AU | 5.396631 |
2005 | UNR | AU | 5.044791 |
2006 | UNR | AU | 4.789363 |
2007 | UNR | AU | 4.379649 |
df_oecd['area'].describe() # area contains 36 values
count 432
unique 36
top JP
freq 12
Name: area, dtype: object
Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)
df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()
concept | area | Value | |
---|---|---|---|
year | |||
2010 | UNR | CA | 7.988900 |
2011 | UNR | CA | 7.453610 |
2012 | UNR | CA | 7.323584 |
2013 | UNR | CA | 7.169742 |
2014 | UNR | CA | 6.881227 |
df_oecd_ca['area'].describe() # area contains only one value (CA)
count 12
unique 1
top CA
freq 12
Name: area, dtype: object
df_oecd_ca.plot(grid=True)
<matplotlib.axes._subplots.AxesSubplot at 0x113980908>

Trasforming a dataset into a python list¶
oecd.to_table()[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
['unemployment rate', 'Australia', '2003', 5.943826289],
['unemployment rate', 'Australia', '2004', 5.39663128],
['unemployment rate', 'Australia', '2005', 5.044790587],
['unemployment rate', 'Australia', '2006', 4.789362794]]
It is possible to trasform jsonstat data into table in different order
order = [i.did() for i in oecd.dimensions()]
order = order[::-1] # reverse list
table = oecd.to_table(order=order)
table[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
['unemployment rate', 'Australia', '2003', 5.943826289],
['unemployment rate', 'Austria', '2003', 4.278559338],
['unemployment rate', 'Belgium', '2003', 8.158333333],
['unemployment rate', 'Canada', '2003', 7.594616751]]
Notebook: using jsonstat.py python library with jsonstat format version 2.¶
This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site.
In this notebook it is used the data file oecd-canada-col.json from json-stat.org site. This file is compliant to the version 2 of jsonstat. This notebook is equal to version 1. The only difference is the datasource.
# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat # import jsonstat.py package
import matplotlib as plt # for plotting
%matplotlib inline
Download or use cached file oecd-canada-col.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.
url = 'http://json-stat.org/samples/oecd-canada-col.json'
file_name = "oecd-canada-col.json"
file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
print("using already downloaded file {}".format(file_path))
else:
print("download file and storing on disk")
jsonstat.download(url, file_name)
file_path = file_name
using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada-col.json
Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.
collection = jsonstat.from_file(file_path)
collection
pos | dataset |
0 | 'Unemployment rate in the OECD countries 2003-2014' |
1 | 'Population by sex and age group. Canada. 2012' |
Select the firt dataset. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.
oecd = collection.dataset(0)
oecd
pos | id | label | size | role |
0 | concept | indicator | 1 | metric |
1 | area | OECD countries, EU15 and total | 36 | geo |
2 | year | 2003-2014 | 12 | time |
oecd.dimension('concept')
pos | idx | label |
0 | 'UNR' | 'unemployment rate' |
oecd.dimension('area')
pos | idx | label |
0 | 'AU' | 'Australia' |
1 | 'AT' | 'Austria' |
2 | 'BE' | 'Belgium' |
3 | 'CA' | 'Canada' | ... | ... | ... |
oecd.dimension('year')
pos | idx | label |
0 | '2003' | '' |
1 | '2004' | '' |
2 | '2005' | '' |
3 | '2006' | '' | ... | ... | ... |
Shows some detailed info about dimensions.
Accessing value in the dataset¶
Print the value in oecd dataset for area = IT and year = 2012
oecd.data(area='IT', year='2012')
JsonStatValue(idx=201, value=10.55546863, status=None)
oecd.value(area='IT', year='2012')
10.55546863
oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128
5.39663128
oecd.value(concept='UNR',area='AU',year='2004')
5.39663128
Trasforming dataset into pandas DataFrame¶
df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()
concept | area | Value | |
---|---|---|---|
year | |||
2003 | UNR | AU | 5.943826 |
2004 | UNR | AU | 5.396631 |
2005 | UNR | AU | 5.044791 |
2006 | UNR | AU | 4.789363 |
2007 | UNR | AU | 4.379649 |
df_oecd['area'].describe() # area contains 36 values
count 432
unique 36
top ES
freq 12
Name: area, dtype: object
Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)
df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()
concept | area | Value | |
---|---|---|---|
year | |||
2010 | UNR | CA | 7.988900 |
2011 | UNR | CA | 7.453610 |
2012 | UNR | CA | 7.323584 |
2013 | UNR | CA | 7.169742 |
2014 | UNR | CA | 6.881227 |
df_oecd_ca['area'].describe() # area contains only one value (CA)
count 12
unique 1
top CA
freq 12
Name: area, dtype: object
df_oecd_ca.plot(grid=True)
<matplotlib.axes._subplots.AxesSubplot at 0x114298198>

Trasforming a dataset into a python list¶
oecd.to_table()[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
['unemployment rate', 'Australia', '2003', 5.943826289],
['unemployment rate', 'Australia', '2004', 5.39663128],
['unemployment rate', 'Australia', '2005', 5.044790587],
['unemployment rate', 'Australia', '2006', 4.789362794]]
It is possible to trasform jsonstat data into table in different order
order = [i.did() for i in oecd.dimensions()]
order = order[::-1] # reverse list
table = oecd.to_table(order=order)
table[:5]
[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
['unemployment rate', 'Australia', '2003', 5.943826289],
['unemployment rate', 'Austria', '2003', 4.278559338],
['unemployment rate', 'Belgium', '2003', 8.158333333],
['unemployment rate', 'Canada', '2003', 7.594616751]]
Notebook: using jsonstat.py with eurostat api¶
This Jupyter notebook shows the python library jsonstat.py in action. It shows how to explore dataset downloaded from a data provider. This notebook uses some datasets from Eurostat. Eurostat provides a rest api to download its datasets. You can find details about the api here It is possible to use a query builder for discovering the rest api parameters. The following image shows the query builder:
# all import here
from __future__ import print_function
import os
import pandas as pd
import jsonstat
import matplotlib as plt
%matplotlib inline
1 - Exploring data with one dimension (time) with size > 1¶
Following cell downloads a datataset from eurostat. If the file is already downloaded use the copy presents on the disk. Caching file is useful to avoid downloading dataset every time notebook runs. Caching can speed the development, and provides consistent results. You can see the raw data here
url_1 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c?precision=1&geo=IT&unit=EUR_HAB&indic_na=B1GM'
file_name_1 = "eurostat-name_gpd_c-geo_IT.json"
file_path_1 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa.eu_eurostat", file_name_1))
if os.path.exists(file_path_1):
print("using already donwloaded file {}".format(file_path_1))
else:
print("download file")
jsonstat.download(url_1, file_name_1)
file_path_1 = file_name_1
using already donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT.json
Initialize JsonStatCollection with eurostat data and print some info about the collection.
collection_1 = jsonstat.from_file(file_path_1)
collection_1
pos | dataset |
0 | 'nama_gdp_c' |
Previous collection contains only a dataset named ‘nama_gdp_c
‘
nama_gdp_c_1 = collection_1.dataset('nama_gdp_c')
nama_gdp_c_1
pos | id | label | size | role |
0 | unit | unit | 1 | |
1 | indic_na | indic_na | 1 | |
2 | geo | geo | 1 | |
3 | time | time | 69 |
All dimensions of the dataset ‘nama_gdp_c
‘ are of size 1 with
exception of time
dimension. Let’s explore the time dimension.
nama_gdp_c_1.dimension('time')
pos | idx | label |
0 | '1946' | '1946' |
1 | '1947' | '1947' |
2 | '1948' | '1948' |
3 | '1949' | '1949' | ... | ... | ... |
Get value for year 2012.
nama_gdp_c_1.value(time='2012')
25700
Convert the jsonstat data into a pandas dataframe.
df_1 = nama_gdp_c_1.to_data_frame('time', content='id')
df_1.tail()
unit | indic_na | geo | Value | |
---|---|---|---|---|
time | ||||
2010 | EUR_HAB | B1GM | IT | 25700.0 |
2011 | EUR_HAB | B1GM | IT | 26000.0 |
2012 | EUR_HAB | B1GM | IT | 25700.0 |
2013 | EUR_HAB | B1GM | IT | 25600.0 |
2014 | EUR_HAB | B1GM | IT | NaN |
Adding a simple plot
df_1 = df_1.dropna() # remove rows with NaN values
df_1.plot(grid=True, figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x114bc12b0>

2 - Exploring data with two dimensions (geo, time) with size > 1¶
Download or use the jsonstat file cached on disk. The cache is used to avoid internet download during the devolopment to make the things a bit faster. You can see the raw data here
url_2 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c?precision=1&geo=IT&geo=FR&unit=EUR_HAB&indic_na=B1GM'
file_name_2 = "eurostat-name_gpd_c-geo_IT_FR.json"
file_path_2 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa.eu_eurostat", file_name_2))
if os.path.exists(file_path_2):
print("using alredy donwloaded file {}".format(file_path_2))
else:
print("download file and storing on disk")
jsonstat.download(url, file_name_2)
file_path_2 = file_name_2
using alredy donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT_FR.json
collection_2 = jsonstat.from_file(file_path_2)
nama_gdp_c_2 = collection_2.dataset('nama_gdp_c')
nama_gdp_c_2
pos | id | label | size | role |
0 | unit | unit | 1 | |
1 | indic_na | indic_na | 1 | |
2 | geo | geo | 2 | |
3 | time | time | 69 |
nama_gdp_c_2.dimension('geo')
pos | idx | label |
0 | 'FR' | 'France' |
1 | 'IT' | 'Italy' |
nama_gdp_c_2.value(time='2012',geo='IT')
25700
nama_gdp_c_2.value(time='2012',geo='FR')
31100
df_2 = nama_gdp_c_2.to_table(content='id',rtype=pd.DataFrame)
df_2.tail()
unit | indic_na | geo | time | Value | |
---|---|---|---|---|---|
133 | EUR_HAB | B1GM | IT | 2010 | 25700.0 |
134 | EUR_HAB | B1GM | IT | 2011 | 26000.0 |
135 | EUR_HAB | B1GM | IT | 2012 | 25700.0 |
136 | EUR_HAB | B1GM | IT | 2013 | 25600.0 |
137 | EUR_HAB | B1GM | IT | 2014 | NaN |
df_FR_IT = df_2.dropna()[['time', 'geo', 'Value']]
df_FR_IT = df_FR_IT.pivot('time', 'geo', 'Value')
df_FR_IT.plot(grid=True, figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x114c0f0b8>

df_3 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'FR'})
df_3 = df_3.dropna()
df_3.plot(grid=True,figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x1178e7d30>

df_4 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'IT'})
df_4 = df_4.dropna()
df_4.plot(grid=True,figsize=(20,5))
<matplotlib.axes._subplots.AxesSubplot at 0x117947630>

Notebook: using jsonstat.py to explore ISTAT data (house price index)¶
This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is Italian National Institute of Statistics. It publishs a rest api for querying italian statistics.
We starts importing some modules.
from __future__ import print_function
import os
import istat
from IPython.core.display import HTML
Step 1: using istat module to get a jsonstat collection¶
Following code sets a cache dir where to store json files download by Istat api. Storing file on disk speed up development, and assures consistent results over time. Anyway you can delete file to donwload a fresh copy.
cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached"))
istat.cache_dir(cache_dir)
print("cache_dir is '{}'".format(istat.cache_dir()))
cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'
Using istat api, we can shows the istat areas used to categorize the datasets
istat.areas()
id | desc |
---|---|
3 | 2011 Population and housing census |
4 | Enterprises |
7 | Environment and Energy |
8 | Population and Households |
9 | Households Economic Conditions and Disparities |
10 | Health statistics |
11 | Social Security and Welfare |
12 | Education and training |
13 | Communication, culture and leisure |
14 | Justice and Security |
15 | Citizens' opinions and satisfaction with life |
16 | Social participation |
17 | National Accounts |
19 | Agriculture |
20 | Industry and Construction |
21 | Services |
22 | Public Administrations and Private Institutions |
24 | External Trade and Internationalisation |
25 | Prices |
26 | Labour |
Following code list all datasets contained into area Prices
.
istat_area_prices = istat.area('Prices')
istat_area_prices.datasets()
cod | name | dim |
---|---|---|
DCSC_FABBRESID_1 | Construction costs index - monthly data | 5 |
DCSC_PREZPRODSERV_1 | Services producer prices index | 5 |
DCSC_PREZZPIND_1 | Producer price index for industrial products - monthly data | 6 |
DCSP_FOI1 | FOI Monthly data until 2010 | 5 |
DCSP_FOI1B2010 | FOI - Monthly data from 2011 to 2015 | 5 |
DCSP_FOI1B2015 | FOI - Monthly data from 2016 onwards | 5 |
DCSP_FOI2 | FOI Annual average until 2010 | 5 |
DCSP_FOI2B2010 | FOI Annual average from 2011 onwards | 5 |
DCSP_FOI2B2015 | FOI - Annual average from 2016 onwards | 5 |
DCSP_FOI3 | FOI Weights until 2010 | 4 |
DCSP_FOI3B2010 | FOI - Weights from 2011 to 2015 | 4 |
DCSP_FOI3B2015 | FOI - Weights from 2016 onwards | 4 |
DCSP_IPAB | House price index | 5 |
DCSP_IPCA1 | HICP - Monthly data from 2001 to 2015 (base 2005=100) | 5 |
DCSP_IPCA1B2015 | HICP - Monthly data from 2001 onwards (base 2015=100) | 5 |
DCSP_IPCA2 | HICP - Annual average from 2001 to 2015 (base 2005=100) | 5 |
DCSP_IPCA2B2015 | HICP - Annual average from 2001 onwards (base 2015=100) | 5 |
DCSP_IPCA3 | HICP Weights from 2001 onwards | 4 |
DCSP_IPCATC1 | HICP at constant tax rates - Monthly data from 2002 to 2015 (base 2005=100) | 5 |
DCSP_IPCATC1B2015 | HICP at constant tax rates - Monthly data from 2002 onwards (base 2015=100) | 5 |
DCSP_IPCATC2 | HICP at constant tax rates - Annual average from 2002 to 2015 (base 2005=100) | 5 |
DCSP_IPCATC2B2015 | HICP at constant tax rates - Annual average from 2002 onwards (base 2015=100) | 5 |
DCSP_NIC1B2015 | NIC - Monthly data from 2016 onwards | 5 |
DCSP_NIC3B2015 | NIC - Weights from 2016 onwards | 4 |
DCSP_NICDUE | NIC Annual average until 2010 | 5 |
DCSP_NICDUEB2010 | NIC Annual average from 2011 onwards | 5 |
DCSP_NICTRE | NIC Weights until 2010 | 4 |
DCSP_NICTREB2010 | NIC - Weights from 2011 to 2015 | 4 |
DCSP_NICUNOB | NIC Monthly data until 2010 | 5 |
DCSP_NICUNOBB2010 | NIC - Monthly data from 2011 to 2015 | 5 |
List all dimension for dataset DCSP_IPAB
(House price index)
istat_dataset_dcsp_ipab = istat_area_prices.dataset('DCSP_IPAB')
istat_dataset_dcsp_ipab
nr | name | nr. values | values (first 3 values) |
---|---|---|---|
0 | Territory | 1 | 1:'Italy' |
1 | Index type | 3 | 18:'house price index (base 2010=100) - quarterly data', 19:'house price index (base 2010=100) - annual average', 20:'house price index (base 2010=100) - weights' ... |
2 | Measure | 5 | 8:'annual average rate of change', 4:'index number', 22:'not applicable' ... |
3 | Purchases of dwellings | 3 | 4:'H1 - all items', 5:'H11 - new dwellings', 6:'H12 - existing dwellings' ... |
4 | Time and frequency | 29 | 2112:'Q1-2011', 2178:'Q3-2014', 2116:'Q2-2011' ... |
Finally from istat dataset we extracts data in jsonstat format by specifying dimensions we are interested.
spec = {
"Territory": 1, "Index type": 18,
# "Measure": 0, # "Purchases of dwelling": 0, # "Time and frequency": 0
}
# convert istat dataset into jsonstat collection and print some info
collection = istat_dataset_dcsp_ipab.getvalues(spec)
collection
pos | dataset |
0 | 'IDMISURA1*IDTYPPURCH*IDTIME' |
The previous call is equivalent to call istat api with a “1,18,0,0,0” string of number. Below is the mapping from the number and dimensions:
dimension | ||
---|---|---|
Territory | 1 | Italy |
Type | 18 | house price index (base 2010=100) - quarterly data’ |
Measure | 0 | ALL |
Purchase of dwelling | 0 | ALL |
Time and frequency | 0 | ALL |
json_stat_data = istat_dataset_dcsp_ipab.getvalues("1,18,0,0,0")
json_stat_data
pos | dataset |
0 | 'IDMISURA1*IDTYPPURCH*IDTIME' |
step2: using jsonstat.py api.¶
Now we have a jsonstat collection, let expore it with the api of jsonstat.py
Print some info of one dataset contained into the above jsonstat collection
jsonstat_dataset = collection.dataset('IDMISURA1*IDTYPPURCH*IDTIME')
jsonstat_dataset
pos | id | label | size | role |
0 | IDMISURA1 | Measure | 3 | |
1 | IDTYPPURCH | Purchases of dwellings | 3 | |
2 | IDTIME | Time and frequency | 23 |
Print info about the dimensions to get an idea about the data
jsonstat_dataset.dimension('IDMISURA1')
pos | idx | label |
0 | '4' | 'index number' |
1 | '6' | 'percentage changes on the previous period' |
2 | '7' | 'percentage changes on the same period of the previous year' |
jsonstat_dataset.dimension('IDTYPPURCH')
pos | idx | label |
0 | '4' | 'H1 - all items' |
1 | '5' | 'H11 - new dwellings' |
2 | '6' | 'H12 - existing dwellings' |
jsonstat_dataset.dimension('IDTIME')
pos | idx | label |
0 | '2093' | 'Q1-2010' |
1 | '2097' | 'Q2-2010' |
2 | '2102' | 'Q3-2010' |
3 | '2106' | 'Q4-2010' | ... | ... | ... |
import pandas as pd
df = jsonstat_dataset.to_table(rtype=pd.DataFrame)
df.head()
Measure | Purchases of dwellings | Time and frequency | Value | |
---|---|---|---|---|
0 | index number | H1 - all items | Q1-2010 | 99.5 |
1 | index number | H1 - all items | Q2-2010 | 100.0 |
2 | index number | H1 - all items | Q3-2010 | 100.3 |
3 | index number | H1 - all items | Q4-2010 | 100.2 |
4 | index number | H1 - all items | Q1-2011 | 100.1 |
filtered = df.loc[
(df['Measure'] == 'index number') & (df['Purchases of dwellings'] == 'H1 - all items'),
['Time and frequency', 'Value']
]
filtered.set_index('Time and frequency')
Value | |
---|---|
Time and frequency | |
Q1-2010 | 99.5 |
Q2-2010 | 100.0 |
Q3-2010 | 100.3 |
Q4-2010 | 100.2 |
Q1-2011 | 100.1 |
Q2-2011 | 101.2 |
Q3-2011 | 101.2 |
Q4-2011 | 100.5 |
Q1-2012 | 99.9 |
Q2-2012 | 99.1 |
Q3-2012 | 97.4 |
Q4-2012 | 95.3 |
Q1-2013 | 93.9 |
Q2-2013 | 93.3 |
Q3-2013 | 91.9 |
Q4-2013 | 90.2 |
Q1-2014 | 89.3 |
Q2-2014 | 88.7 |
Q3-2014 | 88.3 |
Q4-2014 | 86.9 |
Q1-2015 | 86.1 |
Q2-2015 | 86.1 |
Q3-2015 | 86.3 |
%matplotlib inline
import matplotlib.pyplot as plt
values = filtered['Value'].tolist()
labels = filtered['Time and frequency']
xs = [i + 0.1 for i, _ in enumerate(values)]
# bars are by default width 0.8, so we'll add 0.1 to the left coordinates
# so that each bar is centered
# plot bars with left x-coordinates [xs], heights [num_oscars]
plt.figure(figsize=(15,4))
plt.bar(xs, values)
plt.ylabel("value")
plt.title("house index")
# label x-axis with movie names at bar centers
plt.xticks([i + 0.5 for i, _ in enumerate(labels)], labels, rotation='vertical')
plt.show()

Notebook: using jsonstat.py to explore ISTAT data (unemployment)¶
This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is the Italian National Institute of Statistics. It publishes a rest api for browsing italian statistics. This api can return results in jsonstat format.
from __future__ import print_function
import os
import pandas as pd
from IPython.core.display import HTML
import matplotlib.pyplot as plt
%matplotlib inline
import istat
Using istat api¶
Next step is to set a cache dir where to store json files downloaded from Istat. Storing file on disk speeds up development, and assures consistent results over time. Eventually, you can delete donwloaded files to get a fresh copy.
cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached")) # you could choice /tmp
istat.cache_dir(cache_dir)
print("cache_dir is '{}'".format(istat.cache_dir()))
cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'
List all istat areas
istat.areas()
id | desc |
---|---|
3 | 2011 Population and housing census |
4 | Enterprises |
7 | Environment and Energy |
8 | Population and Households |
9 | Households Economic Conditions and Disparities |
10 | Health statistics |
11 | Social Security and Welfare |
12 | Education and training |
13 | Communication, culture and leisure |
14 | Justice and Security |
15 | Citizens' opinions and satisfaction with life |
16 | Social participation |
17 | National Accounts |
19 | Agriculture |
20 | Industry and Construction |
21 | Services |
22 | Public Administrations and Private Institutions |
24 | External Trade and Internationalisation |
25 | Prices |
26 | Labour |
List all datasets contained into area LAB
(Labour)
istat_area_lab = istat.area('LAB')
istat_area_lab
List all dimension for dataset DCCV_TAXDISOCCU
(Unemployment rate)
istat_dataset_taxdisoccu = istat_area_lab.dataset('DCCV_TAXDISOCCU')
istat_dataset_taxdisoccu
nr | name | nr. values | values (first 3 values) |
---|---|---|---|
0 | Territory | 136 | 1:'Italy', 3:'Nord', 4:'Nord-ovest' ... |
1 | Data type | 1 | 6:'unemployment rate' |
2 | Measure | 1 | 1:'percentage values' |
3 | Gender | 3 | 1:'males', 2:'females', 3:'total' ... |
4 | Age class | 14 | 32:'18-29 years', 3:'20-24 years', 4:'15-24 years' ... |
5 | Highest level of education attained | 5 | 11:'tertiary (university, doctoral and specialization courses)', 12:'total', 3:'primary school certificate, no educational degree' ... |
6 | Citizenship | 3 | 1:'italian', 2:'foreign', 3:'total' ... |
7 | Duration of unemployment | 2 | 2:'12 months and more', 3:'total' |
8 | Time and frequency | 193 | 1536:'Q4-1980', 2049:'Q4-2007', 1540:'1981' ... |
Extract data from dataset DCCV_TAXDISOCCU
spec = {
"Territory": 0, # 1 Italy
"Data type": 6, # (6:'unemployment rate')
'Measure': 1, # 1 : 'percentage values'
'Gender': 3, # 3 total
'Age class':31, # 31:'15-74 years'
'Highest level of education attained': 12, # 12:'total',
'Citizenship': 3, # 3:'total')
'Duration of unemployment': 3, # 3:'total'
'Time and frequency': 0 # All
}
# convert istat dataset into jsonstat collection and print some info
collection = istat_dataset_taxdisoccu.getvalues(spec)
collection
pos | dataset |
0 | 'IDITTER107*IDTIME' |
Print some info of the only dataset contained into the above jsonstat collection
jsonstat_dataset = collection.dataset(0)
jsonstat_dataset
pos | id | label | size | role |
0 | IDITTER107 | Territory | 135 | |
1 | IDTIME | Time and frequency | 58 |
df_all = jsonstat_dataset.to_table(rtype=pd.DataFrame)
df_all.head()
Territory | Time and frequency | Value | |
---|---|---|---|
0 | Italy | 2004 | 8.01 |
1 | Italy | Q1-2004 | 8.68 |
2 | Italy | Q2-2004 | 7.88 |
3 | Italy | Q3-2004 | 7.33 |
4 | Italy | Q4-2004 | 8.17 |
df_all.pivot('Territory', 'Time and frequency', 'Value').head()
Time and frequency | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | ... | Q4-2005 | Q4-2006 | Q4-2007 | Q4-2008 | Q4-2009 | Q4-2010 | Q4-2011 | Q4-2012 | Q4-2013 | Q4-2014 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Territory | |||||||||||||||||||||
Abruzzo | 7.71 | 7.88 | 6.57 | 6.17 | 6.63 | 7.97 | 8.67 | 8.59 | 10.85 | 11.29 | ... | 6.95 | 6.84 | 5.87 | 6.67 | 7.02 | 9.15 | 9.48 | 10.48 | 11.21 | 12.08 |
Agrigento | 20.18 | 17.62 | 13.40 | 16.91 | 16.72 | 17.43 | 19.42 | 17.61 | 19.48 | 20.98 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Alessandria | 5.34 | 5.37 | 4.65 | 4.63 | 4.85 | 5.81 | 5.34 | 6.66 | 10.48 | 11.80 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Ancona | 5.11 | 4.14 | 4.05 | 3.49 | 3.78 | 5.82 | 4.94 | 6.84 | 9.20 | 11.27 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Arezzo | 4.55 | 5.50 | 4.88 | 4.61 | 4.91 | 5.51 | 5.87 | 6.04 | 7.33 | 8.04 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 58 columns
spec = {
"Territory": 1, # 1 Italy
"Data type": 6, # (6:'unemployment rate')
'Measure': 1,
'Gender': 3,
'Age class':0, # all classes
'Highest level of education attained': 12, # 12:'total',
'Citizenship': 3, # 3:'total')
'Duration of unemployment': 3, # 3:'total')
'Time and frequency': 0 # All
}
# convert istat dataset into jsonstat collection and print some info
collection_2 = istat_dataset_taxdisoccu.getvalues(spec)
collection_2
pos | dataset |
0 | 'IDCLASETA28*IDTIME' |
df = collection_2.dataset(0).to_table(rtype=pd.DataFrame, blocked_dims={'IDCLASETA28':'31'})
df.head(6)
Age class | Time and frequency | Value | |
---|---|---|---|
0 | 15-74 years | Q4-1992 | NaN |
1 | 15-74 years | 1993 | NaN |
2 | 15-74 years | Q1-1993 | NaN |
3 | 15-74 years | Q2-1993 | NaN |
4 | 15-74 years | Q3-1993 | NaN |
5 | 15-74 years | Q4-1993 | NaN |
df = df.dropna()
df = df[df['Time and frequency'].str.contains(r'^Q.*')]
# df = df.set_index('Time and frequency')
df.head(6)
Age class | Time and frequency | Value | |
---|---|---|---|
57 | 15-74 years | Q1-2004 | 8.68 |
58 | 15-74 years | Q2-2004 | 7.88 |
59 | 15-74 years | Q3-2004 | 7.33 |
60 | 15-74 years | Q4-2004 | 8.17 |
62 | 15-74 years | Q1-2005 | 8.27 |
63 | 15-74 years | Q2-2005 | 7.54 |
df.plot(x='Time and frequency',y='Value', figsize=(18,4))
<matplotlib.axes._subplots.AxesSubplot at 0x1184b1908>

fig = plt.figure(figsize=(18,6))
ax = fig.add_subplot(111)
plt.grid(True)
df.plot(x='Time and frequency',y='Value', ax=ax, grid=True)
# kind='barh', , alpha=a, legend=False, color=customcmap,
# edgecolor='w', xlim=(0,max(df['population'])), title=ttl)
<matplotlib.axes._subplots.AxesSubplot at 0x11a898b70>

# plt.figure(figsize=(7,4))
# plt.plot(df['Time and frequency'],df['Value'], lw=1.5, label='1st')
# plt.plot(y[:,1], lw=1.5, label='2st')
# plt.plot(y,'ro')
# plt.grid(True)
# plt.legend(loc=0)
# plt.axis('tight')
# plt.xlabel('index')
# plt.ylabel('value')
# plt.title('a simple plot')
# forza lavoro
istat_forzlv = istat.dataset('LAB', 'DCCV_FORZLV')
spec = {
"Territory": 'Italy',
"Data type": 'number of labour force 15 years and more (thousands)', #
'Measure': 'absolute values',
'Gender': 'total',
'Age class': '15 years and over',
'Highest level of education attained': 'total',
'Citizenship': 'total',
'Time and frequency': 0
}
df_forzlv = istat_forzlv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_forzlv = df_forzlv.dropna()
df_forzlv = df_forzlv[df_forzlv['Time and frequency'].str.contains(r'^Q.*')]
df_forzlv.tail(6)
Time and frequency | Value | |
---|---|---|
187 | Q2-2014 | 25419.15 |
188 | Q3-2014 | 25373.70 |
189 | Q4-2014 | 25794.44 |
190 | Q1-2015 | 25460.25 |
191 | Q2-2015 | 25598.29 |
192 | Q3-2015 | 25321.61 |
istat_inattiv = istat.dataset('LAB', 'DCCV_INATTIV')
# HTML(istat_inattiv.info_dimensions_as_html())
spec = {
"Territory": 'Italy',
"Data type": 'number of inactive persons',
'Measure': 'absolute values',
'Gender': 'total',
'Age class': '15 years and over',
'Highest level of education attained': 'total',
'Time and frequency': 0
}
df_inattiv = istat_inattiv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_inattiv = df_inattiv.dropna()
df_inattiv = df_inattiv[df_inattiv['Time and frequency'].str.contains(r'^Q.*')]
df_inattiv.tail(6)
citizenship | Labour status | Inactivity reasons | Main status | Time and frequency | Value | |
---|---|---|---|---|---|---|
24756 | total | total | total | total | Q2-2014 | 26594.57 |
24757 | total | total | total | total | Q3-2014 | 26646.90 |
24758 | total | total | total | total | Q4-2014 | 26257.15 |
24759 | total | total | total | total | Q1-2015 | 26608.07 |
24760 | total | total | total | total | Q2-2015 | 26487.67 |
24761 | total | total | total | total | Q3-2015 | 26746.26 |
Notebook: using jsonstat.py to explore ISTAT data (unemployment)¶
This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is the Italian National Institute of Statistics. It publishs a rest api for browsing italian statistics. This api can return results in jsonstat format.
La forza lavoro e composta da occupati e disoccupati. La popolozione sopra i 15 anni e composta da forza lavoro ed inatttivi.
Tasso disoccupazione = Disoccupati/Occupati
download dataset from Istat¶
from __future__ import print_function
import os
import pandas as pd
from IPython.core.display import HTML
import matplotlib.pyplot as plt
%matplotlib inline
import istat
# Next step is to set a cache dir where to store json files downloaded from Istat.
# Storing file on disk speeds up development, and assures consistent results over time.
# Eventually, you can delete donwloaded files to get a fresh copy.
cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached"))
istat.cache_dir(cache_dir)
istat.lang(0) # set italian language
print("cache_dir is '{}'".format(istat.cache_dir()))
cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'
# List all datasets contained into area `LAB` (Labour)
istat.area('LAB').datasets()
cod | name | dim |
---|---|---|
DCCV_COMPL | Indicatori complementari | 12 |
DCCV_DISOCCUPT | Disoccupati | 10 |
DCCV_DISOCCUPTDE | Disoccupati - dati destagionalizzati | 7 |
DCCV_DISOCCUPTMENS | Disoccupati - dati mensili | 8 |
DCCV_FORZLV | Forze di lavoro | 8 |
DCCV_FORZLVDE | Forze di lavoro - dati destagionalizzati | 7 |
DCCV_FORZLVMENS | Forze lavoro - dati mensili | 8 |
DCCV_INATTIV | Inattivi | 11 |
DCCV_INATTIVDE | Inattivi - dati destagionalizzati | 7 |
DCCV_INATTIVMENS | Inattivi - dati mensili | 8 |
DCCV_NEET | NEET (giovani non occupati e non in istruzione e formazione) | 10 |
DCCV_OCCUPATIMENS | Occupati - dati mensili | 8 |
DCCV_OCCUPATIT | Occupati | 14 |
DCCV_OCCUPATITDE | Occupati - dati destagionalizzati | 8 |
DCCV_ORELAVMED | Occupati per ore settimanali lavorate e numero di ore settimanali lavorate procapite | 12 |
DCCV_TAXATVT | Tasso di attività | 8 |
DCCV_TAXATVTDE | Tasso di attività - dati destagionalizzati | 7 |
DCCV_TAXATVTMENS | Tasso di attività - dati mensili | 8 |
DCCV_TAXDISOCCU | Tasso di disoccupazione | 9 |
DCCV_TAXDISOCCUDE | Tasso di disoccupazione - dati destagionalizzati | 7 |
DCCV_TAXDISOCCUMENS | Tasso di disoccupazione - dati mensili | 8 |
DCCV_TAXINATT | Tasso di inattività | 8 |
DCCV_TAXINATTDE | Tasso di inattività - dati destagionalizzati | 7 |
DCCV_TAXINATTMENS | Tasso di inattività - dati mensili | 8 |
DCCV_TAXOCCU | Tasso di occupazione | 8 |
DCCV_TAXOCCUDE | Tasso di occupazione - dati destagionalizzati | 7 |
DCCV_TAXOCCUMENS | Tasso di occupazione - dati mensili | 8 |
DCIS_RICSTAT | Ricostruzione statistica delle serie regionali di popolazione del periodo 1/1/2002-1/1/2014 | 6 |
DCSC_COSTLAVSTRUT_1 | Struttura del costo del lavoro (indagine quadriennale) | 6 |
DCSC_COSTLAVULAOROS_1 | Indicatori del costo del lavoro per Ula - dati trimestrali | 5 |
DCSC_GI_COS | Costo del lavoro nelle imprese con almeno 500 dipendenti - dati mensili | 6 |
DCSC_GI_OCC | Occupazione dipendente, tassi di ingresso e uscita nelle imprese con almeno 500 dipendenti - dati mensili | 6 |
DCSC_GI_ORE | Ore lavorate nelle imprese con almeno 500 dipendenti - dati mensili | 6 |
DCSC_GI_RE | Retribuzione lorda nelle imprese con almeno 500 dipendenti - dati mensili | 6 |
DCSC_ORE10_1 | Ore lavorate nelle imprese con almeno 10 dipendenti - dati trimestrali | 5 |
DCSC_OROS_1 | Indice delle posizioni lavorative alle dipendenze - dati trimestrali | 5 |
DCSC_POSTIVAC_1 | Tasso di posti vacanti - dati trimestrali | 5 |
DCSC_RETRATECO1 | Retribuzioni contrattuali per Ateco 2007 | 6 |
DCSC_RETRCASSCOMPPA | Retribuzione contrattuale di cassa e di competenza per dipendente della pubblica amministrazione per contratto - dati annuali - euro | 7 |
DCSC_RETRCONTR1C | Retribuzioni contrattuali per contratto - dati mensili e annuali . | 6 |
DCSC_RETRCONTR1O | Orario contrattuale annuo lordo, netto, ferie e altre ore di riduzione | 6 |
DCSC_RETRCONTR1T | Indicatori di tensione contrattuale - dati mensili e annuali | 6 |
DCSC_RETRULAOROS_1 | Indice delle retribuzioni lorde per Ula - dati trimestrali | 5 |
Download - numero occupati - numero disoccupati - forza lavoro - controllare che nroccupati + nrdisoccupati = forza lavoro
Download Occupati¶
# DCCV_OCCUPATIT
istat_occupatit = istat.dataset('LAB', 'DCCV_OCCUPATIT')
# HTML(istat_occupatit.info_dimensions_as_html(show_values=0))
spec = {
'Territorio': 'Italia',
'Sesso': 'totale',
'Classe di età': '15 anni e più',
'Titolo di studio': 'totale',
'Cittadinanza': 'totale',
'Ateco 2002' : '0010 totale',
'Ateco 2007' : '0010 totale',
'Posizione professionale': 'totale',
'Profilo professionale': 'totale',
'Professione 2001': 'totale',
'Professione 2011': 'totale',
'Regime orario': 'totale',
'Carattere occupazione': 'totale',
'Tempo e frequenza': 0
}
df_occupatit = istat_occupatit.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_occupatit[df_occupatit['Tempo e frequenza'].str.contains(r'^T.*')]
df_occupatit.tail(6)
Tempo e frequenza | Value | |
---|---|---|
187 | T2-2014 | 22316.76 |
188 | T3-2014 | 22398.30 |
189 | T4-2014 | 22374.93 |
190 | T1-2015 | 22158.45 |
191 | T2-2015 | 22496.79 |
192 | T3-2015 | 22645.07 |
df_occupatit.ix[192]
Tempo e frequenza T3-2015
Value 22645.1
Name: 192, dtype: object
Download disoccupati¶
istat_disoccupt = istat.dataset('LAB', 'DCCV_DISOCCUPT')
istat_disoccupt
nr | name | nr. values | values (first 3 values) |
---|---|---|---|
0 | Territorio | 136 | 1:'Italia', 3:'Nord', 4:'Nord-ovest' ... |
1 | Tipo dato | 1 | 2:'numero di persone in cerca di occupazione 15 anni e oltre (valori in migliaia)' |
2 | Misura | 1 | 9:'valori assoluti' |
3 | Sesso | 3 | 1:'maschi', 2:'femmine', 3:'totale' ... |
4 | Classe di età | 11 | 17:'45-54 anni', 4:'15-24 anni', 21:'55-64 anni' ... |
5 | Titolo di studio | 5 | 11:'laurea e post-laurea', 12:'totale', 3:'licenza di scuola elementare, nessun titolo di studio' ... |
6 | Cittadinanza | 3 | 1:'italiano-a', 2:'straniero-a', 3:'totale' ... |
7 | Condizione professionale | 4 | 3:'disoccupati ex-occupati', 4:'disoccupati ex-inattivi', 5:'disoccupati senza esperienza di lavoro' ... |
8 | Durata disoccupazione | 2 | 2:'12 mesi o più', 3:'totale' |
9 | Tempo e frequenza | 193 | 1536:'T4-1980', 2049:'T4-2007', 1540:'1981' ... |
spec = {
'Territorio': 'Italia',
'Tipo dato' : 'numero di persone in cerca di occupazione 15 anni e oltre (valori in migliaia)',
'Misura': 'valori assoluti',
'Sesso': 'totale',
'Classe di età': '15 anni e più',
'Titolo di studio': 'totale',
'Cittadinanza': 'totale',
'Condizione professionale': 'totale',
'Durata disoccupazione': 'totale',
'Tempo e frequenza': 0
}
df_disoccupt = istat_disoccupt.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
df_disoccupt[df_disoccupt['Tempo e frequenza'].str.contains(r'^T.*')]
df_disoccupt.tail(6)
Tempo e frequenza | Value | |
---|---|---|
187 | T2-2014 | 3102.39 |
188 | T3-2014 | 2975.40 |
189 | T4-2014 | 3419.51 |
190 | T1-2015 | 3301.81 |
191 | T2-2015 | 3101.50 |
192 | T3-2015 | 2676.55 |
Download forza Lavoro¶
istat_forzlv = istat.dataset('LAB', 'DCCV_FORZLV')
istat_forzlv
nr | name | nr. values | values (first 3 values) |
---|---|---|---|
0 | Territorio | 136 | 1:'Italia', 3:'Nord', 4:'Nord-ovest' ... |
1 | Tipo dato | 1 | 3:'numero di forze di lavoro15 anni e oltre (valori in migliaia)' |
2 | Misura | 1 | 9:'valori assoluti' |
3 | Sesso | 3 | 1:'maschi', 2:'femmine', 3:'totale' ... |
4 | Classe di età | 10 | 17:'45-54 anni', 4:'15-24 anni', 21:'55-64 anni' ... |
5 | Titolo di studio | 5 | 11:'laurea e post-laurea', 12:'totale', 3:'licenza di scuola elementare, nessun titolo di studio' ... |
6 | Cittadinanza | 3 | 1:'italiano-a', 2:'straniero-a', 3:'totale' ... |
7 | Tempo e frequenza | 193 | 1536:'T4-1980', 2049:'T4-2007', 1540:'1981' ... |
spec = {
'Territorio': 'Italia',
'Tipo dato': 'numero di forze di lavoro15 anni e oltre (valori in migliaia)',
'Misura': 'valori assoluti',
'Sesso': 'totale',
'Classe di età': '15 anni e più',
'Titolo di studio': 'totale',
'Cittadinanza': 'totale',
'Tempo e frequenza': 0
}
df_forzlv = istat_forzlv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
# df_forzlv
# df_forzlv = df_forzlv.dropna()
df_forzlv = df_forzlv[df_forzlv['Tempo e frequenza'].str.contains(r'^T.*')]
df_forzlv.tail(6)
Tempo e frequenza | Value | |
---|---|---|
187 | T2-2014 | 25419.15 |
188 | T3-2014 | 25373.70 |
189 | T4-2014 | 25794.44 |
190 | T1-2015 | 25460.25 |
191 | T2-2015 | 25598.29 |
192 | T3-2015 | 25321.61 |
Download inattivi¶
istat_inattiv = istat.dataset('LAB', 'DCCV_INATTIV')
istat.options.display.max_rows = 0
# HTML(istat_inattiv.info_dimensions_as_html(show_values=0))
istat_inattiv
nr | name | nr. values | values (alls values) |
---|---|---|---|
0 | Territorio | 136 | 1:'Italia', 3:'Nord', 4:'Nord-ovest', 5:'Piemonte', 6:'Torino', 7:'Vercelli', 8:'Biella', 9:'Verbano-Cusio-Ossola', 10:'Novara', 11:'Cuneo', 12:'Asti', 13:'Alessandria', 14:'Valle d'Aosta / Vallée d'Aoste', 15:'Valle d'Aosta / Vallée d'Aoste', 16:'Liguria', 17:'Imperia', 18:'Savona', 19:'Genova', 20:'La Spezia', 21:'Lombardia', 22:'Varese', 23:'Como', 24:'Lecco', 25:'Sondrio', 26:'Milano', 27:'Bergamo', 28:'Brescia', 29:'Pavia', 30:'Lodi', 31:'Cremona', 32:'Mantova', 33:'Nord-est', 34:'Trentino Alto Adige / Südtirol', 35:'Provincia Autonoma Bolzano / Bozen', 37:'Provincia Autonoma Trento', 39:'Veneto', 40:'Verona', 41:'Vicenza', 42:'Belluno', 43:'Treviso', 44:'Venezia', 45:'Padova', 46:'Rovigo', 47:'Friuli-Venezia Giulia', 48:'Pordenone', 49:'Udine', 50:'Gorizia', 51:'Trieste', 52:'Emilia-Romagna', 53:'Piacenza', 54:'Parma', 55:'Reggio nell'Emilia', 56:'Modena', 57:'Bologna', 58:'Ferrara', 59:'Ravenna', 60:'Forlì-Cesena', 61:'Rimini', 62:'Centro', 63:'Toscana', 64:'Massa-Carrara', 65:'Lucca', 66:'Pistoia', 67:'Firenze', 68:'Prato', 69:'Livorno', 70:'Pisa', 71:'Arezzo', 72:'Siena', 73:'Grosseto', 74:'Umbria', 75:'Perugia', 76:'Terni', 77:'Marche', 78:'Pesaro e Urbino', 79:'Ancona', 80:'Macerata', 81:'Ascoli Piceno', 82:'Lazio', 83:'Viterbo', 84:'Rieti', 85:'Roma', 86:'Latina', 87:'Frosinone', 88:'Mezzogiorno', 90:'Abruzzo', 91:'L'Aquila', 92:'Teramo', 93:'Pescara', 94:'Chieti', 95:'Molise', 96:'Isernia', 97:'Campobasso', 98:'Campania', 99:'Caserta', 100:'Benevento', 101:'Napoli', 102:'Avellino', 103:'Salerno', 104:'Puglia', 105:'Foggia', 106:'Bari', 107:'Taranto', 108:'Brindisi', 109:'Lecce', 110:'Basilicata', 111:'Potenza', 112:'Matera', 113:'Calabria', 114:'Cosenza', 115:'Crotone', 116:'Catanzaro', 117:'Vibo Valentia', 118:'Reggio di Calabria', 120:'Sicilia', 121:'Trapani', 122:'Palermo', 123:'Messina', 124:'Agrigento', 125:'Caltanissetta', 126:'Enna', 127:'Catania', 128:'Ragusa', 129:'Siracusa', 130:'Sardegna', 131:'Sassari', 132:'Nuoro', 133:'Cagliari', 134:'Oristano', 135:'Olbia-Tempio', 136:'Ogliastra', 137:'Medio Campidano', 138:'Carbonia-Iglesias', 146:'Monza e della Brianza', 147:'Fermo', 148:'Barletta-Andria-Trani' |
1 | Tipo dato | 2 | 3:'numero di forze di lavoro15 anni e oltre (valori in migliaia)', 4:'numero di inattivi (valori in migliaia)' |
2 | Misura | 1 | 9:'valori assoluti' |
3 | Sesso | 3 | 1:'maschi', 2:'femmine', 3:'totale' |
4 | Classe di età | 12 | 1:'0-14 anni', 4:'15-24 anni', 7:'15-34 anni', 8:'25-34 anni', 10:'35-64 anni', 14:'35-44 anni', 17:'45-54 anni', 21:'55-64 anni', 22:'15-64 anni', 25:'65 anni e più', 28:'15 anni e più', 29:'totale' |
5 | Titolo di studio | 5 | 11:'laurea e post-laurea', 12:'totale', 3:'licenza di scuola elementare, nessun titolo di studio', 4:'licenza di scuola media', 7:'diploma' |
6 | Cittadinanza | 3 | 1:'italiano-a', 2:'straniero-a', 3:'totale' |
7 | Condizione professionale | 9 | 6:'inattivi in età lavorativa', 7:'cercano lavoro non attivamente', 8:'cercano lavoro ma non disponibili a lavorare', 9:'non cercano ma disponibili a lavorare', 10:'non cercano e non disponibili a lavorare', 11:'inattivi in età non lavorativa', 12:'non forze di lavoro fino a 14 anni', 13:'non forze di lavoro di 65 anni e più', 14:'totale' |
8 | Motivo inattività | 7 | 1:'scoraggiamento', 2:'motivi familiari', 3:'studio, formazione professionale', 4:'aspetta esiti passate azioni di ricerca', 5:'pensione, non interessa anche per motivi di età', 6:'altri motivi', 7:'totale' |
9 | Condizione dichiarata | 8 | 1:'occupato', 6:'disoccupato alla ricerca di nuova occupazione', 7:'in cerca di prima occupazione', 8:'casalinga-o', 9:'studente', 10:'ritirato-a dal lavoro', 11:'in altra condizione', 12:'totale' |
10 | Tempo e frequenza | 193 | 1536:'T4-1980', 2049:'T4-2007', 1540:'1981', 2053:'2008', 1542:'T1-1981', 2055:'T1-2008', 1546:'T2-1981', 2059:'T2-2008', 1551:'T3-1981', 2064:'T3-2008', 1555:'T4-1981', 2068:'T4-2008', 1559:'1982', 2072:'2009', 1561:'T1-1982', 2074:'T1-2009', 1565:'T2-1982', 2078:'T2-2009', 1570:'T3-1982', 2083:'T3-2009', 1574:'T4-1982', 2087:'T4-2009', 1578:'1983', 2091:'2010', 1580:'T1-1983', 2093:'T1-2010', 1584:'T2-1983', 2097:'T2-2010', 1589:'T3-1983', 2102:'T3-2010', 1593:'T4-1983', 2106:'T4-2010', 1597:'1984', 2110:'2011', 1599:'T1-1984', 2112:'T1-2011', 1603:'T2-1984', 2116:'T2-2011', 1608:'T3-1984', 2121:'T3-2011', 1612:'T4-1984', 2125:'T4-2011', 1616:'1985', 2129:'2012', 1618:'T1-1985', 2131:'T1-2012', 1622:'T2-1985', 2135:'T2-2012', 1627:'T3-1985', 2140:'T3-2012', 1631:'T4-1985', 2144:'T4-2012', 1635:'1986', 2148:'2013', 1637:'T1-1986', 2150:'T1-2013', 1641:'T2-1986', 2154:'T2-2013', 1646:'T3-1986', 2159:'T3-2013', 1650:'T4-1986', 2163:'T4-2013', 1654:'1987', 2167:'2014', 1656:'T1-1987', 2169:'T1-2014', 1660:'T2-1987', 2173:'T2-2014', 1665:'T3-1987', 2178:'T3-2014', 1669:'T4-1987', 2182:'T4-2014', 1673:'1988', 1675:'T1-1988', 2188:'T1-2015', 1679:'T2-1988', 2192:'T2-2015', 1684:'T3-1988', 2197:'T3-2015', 1688:'T4-1988', 1692:'1989', 1694:'T1-1989', 1698:'T2-1989', 1703:'T3-1989', 1707:'T4-1989', 1711:'1990', 1713:'T1-1990', 1717:'T2-1990', 1722:'T3-1990', 1726:'T4-1990', 1730:'1991', 1732:'T1-1991', 1736:'T2-1991', 1741:'T3-1991', 1745:'T4-1991', 1749:'1992', 1751:'T1-1992', 1755:'T2-1992', 1760:'T3-1992', 1764:'T4-1992', 1768:'1993', 1770:'T1-1993', 1774:'T2-1993', 1779:'T3-1993', 1783:'T4-1993', 1787:'1994', 1789:'T1-1994', 1793:'T2-1994', 1798:'T3-1994', 1802:'T4-1994', 1806:'1995', 1808:'T1-1995', 1812:'T2-1995', 1817:'T3-1995', 1821:'T4-1995', 1825:'1996', 1827:'T1-1996', 1831:'T2-1996', 1836:'T3-1996', 1840:'T4-1996', 1844:'1997', 1846:'T1-1997', 1850:'T2-1997', 1855:'T3-1997', 1859:'T4-1997', 1863:'1998', 1865:'T1-1998', 1869:'T2-1998', 1874:'T3-1998', 1878:'T4-1998', 1882:'1999', 1884:'T1-1999', 1888:'T2-1999', 1893:'T3-1999', 1897:'T4-1999', 1901:'2000', 1903:'T1-2000', 1907:'T2-2000', 1912:'T3-2000', 1916:'T4-2000', 1920:'2001', 1922:'T1-2001', 1926:'T2-2001', 1931:'T3-2001', 1935:'T4-2001', 1939:'2002', 1941:'T1-2002', 1945:'T2-2002', 1950:'T3-2002', 1954:'T4-2002', 1958:'2003', 1960:'T1-2003', 1964:'T2-2003', 1969:'T3-2003', 1973:'T4-2003', 1464:'1977', 1977:'2004', 1466:'T1-1977', 1979:'T1-2004', 1470:'T2-1977', 1983:'T2-2004', 1475:'T3-1977', 1988:'T3-2004', 1479:'T4-1977', 1992:'T4-2004', 1483:'1978', 1996:'2005', 1485:'T1-1978', 1998:'T1-2005', 1489:'T2-1978', 2002:'T2-2005', 1494:'T3-1978', 2007:'T3-2005', 1498:'T4-1978', 2011:'T4-2005', 1502:'1979', 2015:'2006', 1504:'T1-1979', 2017:'T1-2006', 1508:'T2-1979', 2021:'T2-2006', 1513:'T3-1979', 2026:'T3-2006', 1517:'T4-1979', 2030:'T4-2006', 1521:'1980', 2034:'2007', 1523:'T1-1980', 2036:'T1-2007', 1527:'T2-1980', 2040:'T2-2007', 1532:'T3-1980', 2045:'T3-2007' |
spec = {
'Territorio': 'Italia',
'Tipo dato': 'numero di inattivi (valori in migliaia)',
'Misura': 'valori assoluti',
'Sesso': 'totale',
'Classe di età': '15 anni e più',
'Titolo di studio': 'totale',
'Cittadinanza' : 'totale',
'Condizione professionale': 'totale',
'Motivo inattività': 'totale',
'Condizione dichiarata': 'totale',
'Tempo e frequenza': 0
}
df_inattiv = istat_inattiv.getvalues(spec).dataset(0).to_table(rtype=pd.DataFrame)
# df_inattiv
df_inattiv = df_inattiv[df_inattiv['Tempo e frequenza'].str.contains(r'^T.*')]
df_inattiv.tail(6)
Tempo e frequenza | Value | |
---|---|---|
187 | T2-2014 | 26594.57 |
188 | T3-2014 | 26646.90 |
189 | T4-2014 | 26257.15 |
190 | T1-2015 | 26608.07 |
191 | T2-2015 | 26487.67 |
192 | T3-2015 | 26746.26 |
Tutorial¶
The parrot module is a module about parrots.
Doctest example:
>>> 2 + 2
4
Test-Output example:
json_string = '''
{
"label" : "concepts",
"category" : {
"index" : {
"POP" : 0,
"PERCENT" : 1
},
"label" : {
"POP" : "population",
"PERCENT" : "weight of age group in the population"
},
"unit" : {
"POP" : {
"label": "thousands of persons",
"decimals": 1,
"type" : "count",
"base" : "people",
"multiplier" : 3
},
"PERCENT" : {
"label" : "%",
"decimals": 1,
"type" : "ratio",
"base" : "per cent",
"multiplier" : 0
}
}
}
}
'''
print(2 + 2)
This would output:
4
Api Reference¶
Jsonstat Module¶
jsonstat module contains classes and utility functions to parse jsonstat data format.
Utility functions¶
-
jsonstat.
from_file
(filename)[source]¶ read a file containing a jsonstat format and return the appropriate object
Parameters: filename – file containing a jsonstat Returns: a JsonStatCollection, JsonStatDataset or JsonStatDimension object example
>>> import os, jsonstat >>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json") >>> o = jsonstat.from_file(filename) >>> type(o) <class 'jsonstat.collection.JsonStatCollection'>
-
jsonstat.
from_url
(url, pathname=None)[source]¶ download an url and return the downloaded content.
see
jsonstat.download()
for how to use pathname parameter.Parameters: - url – ex.: http://json-stat.org/samples/oecd-canada.json
- pathname – If
pathname
is defined the contents of the url
will be stored into the file
<cache_dir>/pathname
Ifpathname
is None the filename will be automatic generated. Ifpathname
is an absolute path cache_dir will be ignored.Returns: the contents of url To set dir where to store downloaded file see
jsonstat.cache_dir()
. Cache expiration policy can be customizedexample:
>>> import jsonstat >>> # cache_dir = os.path.normpath(os.path.join(jsonstat.__fixtures_dir, "json-stat.org")) >>> # download external content into the /tmp dir so next downloads can be faster >>> uri = 'http://json-stat.org/samples/oecd-canada.json' >>> jsonstat.cache_dir("/tmp") '/tmp' >>> o = jsonstat.from_url(uri) >>> print(o) JsonstatCollection contains the following JsonStatDataSet: +-----+----------+ | pos | dataset | +-----+----------+ | 0 | 'oecd' | | 1 | 'canada' | +-----+----------+
-
jsonstat.
from_json
(json_data)[source]¶ transform a json structure into jsonstat objects hierarchy
Parameters: json_data – data structure (dictionary) representing a json Returns: a JsonStatCollection, JsonStatDataset or JsonStatDimension object >>> import json, jsonstat >>> from collections import OrderedDict >>> json_string_v1 = '''{ ... "oecd" : { ... "value": [1], ... "dimension" : { ... "id": ["one"], ... "size": [1], ... "one": { "category": { "index":{"2010":0 } } } ... } ... } ... }''' >>> json_data = json.loads(json_string_v1, object_pairs_hook=OrderedDict) >>> jsonstat.from_json(json_data) JsonstatCollection contains the following JsonStatDataSet: +-----+---------+ | pos | dataset | +-----+---------+ | 0 | 'oecd' | +-----+---------+
-
jsonstat.
from_string
(json_string)[source]¶ parse a jsonstat string and return the appropriate object
Parameters: json_string – string containing a json Returns: a JsonStatCollection, JsonStatDataset or JsonStatDimension object
-
jsonstat.
cache_dir
(cached_dir=u'', time_to_live=None)[source]¶ Manage the directory
cached_dir
where downloaded files are storedwithout parameter return the
cached_dir
directory with a parameters set the directoryParameters: - cached_dir –
- time_to_live –
-
jsonstat.
download
(url, pathname=None)[source]¶ download a url and return the downloaded content``
Parameters: - url – ex.: http://json-stat.org/samples/oecd-canada.json
- pathname – If
pathname
is defined the contents of the url
will be stored into the file
<cache_dir>/pathname
Ifpathname
is None the filename will be automatic generated. Ifpathname
is an absolute path cache_dir will be ignored.Returns: the contents of url To set dir where to store downloaded file see
jsonstat.cache_dir()
. Cache expiration policy can be customized
JsonStatCollection¶
-
class
jsonstat.
JsonStatCollection
[source]¶ Represents a jsonstat collection.
It contains one or more datasets.
>>> import os, jsonstat >>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json") >>> collection = jsonstat.from_file(filename) >>> len(collection) 2 >>> collection JsonstatCollection contains the following JsonStatDataSet: +-----+-----------------------------------------------------+ | pos | dataset | +-----+-----------------------------------------------------+ | 0 | 'Unemployment rate in the OECD countries 2003-2014' | | 1 | 'Population by sex and age group. Canada. 2012' | +-----+-----------------------------------------------------+
parsing¶
JsonStatCollection.
from_file
()[source]¶initialize this collection from a file It is better to use
jsonstat.from_file()
Parameters: filename – name containing a jsonstat Returns: itself to chain call
JsonStatCollection.
from_string
()[source]¶Initialize this collection from a string It is better to use
jsonstat.from_string()
Parameters: json_string – string containing a json Returns: itself to chain call
JsonStatCollection.
from_json
()[source]¶initialize this collection from a json structure It is better to use
jsonstat.from_json()
Parameters: json_data – data structure (dictionary) representing a json Returns: itself to chain call
JsonStatDataSet¶
-
class
jsonstat.
JsonStatDataSet
(name=None)[source]¶ Represents a JsonStat dataset
>>> import os, jsonstat >>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json") >>> dataset = jsonstat.from_file(filename).dataset(0) >>> dataset.label 'Unemployment rate in the OECD countries 2003-2014' >>> print(dataset) name: 'Unemployment rate in the OECD countries 2003-2014' label: 'Unemployment rate in the OECD countries 2003-2014' size: 432 +-----+---------+--------------------------------+------+--------+ | pos | id | label | size | role | +-----+---------+--------------------------------+------+--------+ | 0 | concept | indicator | 1 | metric | | 1 | area | OECD countries, EU15 and total | 36 | geo | | 2 | year | 2003-2014 | 12 | time | +-----+---------+--------------------------------+------+--------+ >>> dataset.dimension(1) +-----+--------+----------------------------+ | pos | idx | label | +-----+--------+----------------------------+ | 0 | 'AU' | 'Australia' | | 1 | 'AT' | 'Austria' | | 2 | 'BE' | 'Belgium' | | 3 | 'CA' | 'Canada' | | 4 | 'CL' | 'Chile' | | 5 | 'CZ' | 'Czech Republic' | | 6 | 'DK' | 'Denmark' | | 7 | 'EE' | 'Estonia' | | 8 | 'FI' | 'Finland' | | 9 | 'FR' | 'France' | | 10 | 'DE' | 'Germany' | | 11 | 'GR' | 'Greece' | | 12 | 'HU' | 'Hungary' | | 13 | 'IS' | 'Iceland' | | 14 | 'IE' | 'Ireland' | | 15 | 'IL' | 'Israel' | | 16 | 'IT' | 'Italy' | | 17 | 'JP' | 'Japan' | | 18 | 'KR' | 'Korea' | | 19 | 'LU' | 'Luxembourg' | | 20 | 'MX' | 'Mexico' | | 21 | 'NL' | 'Netherlands' | | 22 | 'NZ' | 'New Zealand' | | 23 | 'NO' | 'Norway' | | 24 | 'PL' | 'Poland' | | 25 | 'PT' | 'Portugal' | | 26 | 'SK' | 'Slovak Republic' | | 27 | 'SI' | 'Slovenia' | | 28 | 'ES' | 'Spain' | | 29 | 'SE' | 'Sweden' | | 30 | 'CH' | 'Switzerland' | | 31 | 'TR' | 'Turkey' | | 32 | 'UK' | 'United Kingdom' | | 33 | 'US' | 'United States' | | 34 | 'EU15' | 'Euro area (15 countries)' | | 35 | 'OECD' | 'total' | +-----+--------+----------------------------+ >>> dataset.data(0) JsonStatValue(idx=0, value=5.943826289, status=None)
-
__init__
(name=None)[source]¶ Initialize an empty dataset.
Dataset could have a name (key) if we parse a jsonstat format version 1.
Parameters: name – dataset name (for jsonstat v.1)
-
name
()¶ Getter: returns the name of the dataset Type: string
-
label
()¶ Getter: returns the label of the dataset Type: string
-
dimensions¶
querying methods¶
JsonStatDataSet.
data
(*args, **kargs)[source]¶Returns a JsonStatValue containings value and status about a datapoint The datapoint will be retrieved according the parameters
Parameters:
- args –
- data(<int>) where i is index into the
- data(<list>) where lst = [i1,i2,i3,...]) each i indicate the dimension len(lst) == number of dimension
- data(<dict>) where dict is {k1:v1, k2:v2, ...} dimension of size 1 can be ommitted
- kargs –
- data(k1=v1,k2=v2,...) where ki are the id or label of dimension vi are the index or label of the category dimension of size 1 can be ommitted
Returns: a JsonStatValue object
kargs { cat1:value1, ..., cati:valuei, ... } cati can be the id of the dimension or the label of dimension valuei can be the index or label of category ex.:{country:”AU”, “year”:”2014”}
>>> import os, jsonstat >>> filename = os.path.join(jsonstat._examples_dir, "www.json-stat.org", "oecd-canada-col.json") >>> dataset = jsonstat.from_file(filename).dataset(0) >>> dataset.data(0) JsonStatValue(idx=0, value=5.943826289, status=None) >>> dataset.data(concept='UNR', area='AU', year='2003') JsonStatValue(idx=0, value=5.943826289, status=None) >>> dataset.data(area='AU', year='2003') JsonStatValue(idx=0, value=5.943826289, status=None) >>> dataset.data({'area':'AU', 'year':'2003'}) JsonStatValue(idx=0, value=5.943826289, status=None)
transforming¶
JsonStatDataSet.
to_table
(content=u'label', order=None, rtype=<type 'list'>, blocked_dims={}, value_column=u'Value', without_one_dimensions=False)[source]¶Transforms a dataset into a table (a list of row)
table len is the size of dataset + 1 for headers
Parameters:
- content – can be “label” or “id”
- order –
- rtype –
- blocked_dims –
Returns: a list of row, first line is the header
JsonStatDataSet.
to_data_frame
(index=None, content=u'label', order=None, blocked_dims={}, value_column=u'Value')[source]¶Transform dataset to pandas data frame
extract_bidimensional(“year”, “country”) generate the following dataframe: year | country 2010 | 1 2011 | 2 2012 | 3
Parameters:
- index –
- content –
- blocked_dims –
- order –
- value_column –
Returns:
parsing¶
JsonStatDataSet.
from_file
(filename)[source]¶read a jsonstat from a file and parse it to initialize this dataset.
It is better to use
jsonstat.from_file()
Parameters: filename – path of the file. Returns: itself to chain calls
JsonStatDataSet.
from_string
(json_string)[source]¶parse a string containing a jsonstat and initialize this dataset
It is better to use
jsonstat.from_string()
Parameters: json_string – string containing a jsonstat Returns: itself to chain calls
JsonStatDimension¶
-
class
jsonstat.
JsonStatDimension
(did=None, size=None, pos=None, role=None)[source]¶ Represents a JsonStat Dimension. It is contained into a JsonStat Dataset.
>>> from jsonstat import JsonStatDimension >>> json_string = '''{ ... "label" : "concepts", ... "category" : { ... "index" : { "POP" : 0, "PERCENT" : 1 }, ... "label" : { "POP" : "population", ... "PERCENT" : "weight of age group in the population" } ... } ... } ... ''' >>> dim = JsonStatDimension(did="concept", role="metric").from_string(json_string) >>> len(dim) 2 >>> dim.category(0).index 'POP' >>> dim.category('POP').label 'population' >>> dim.category(1) JsonStatCategory(label='weight of age group in the population', index='PERCENT', pos=1) >>> print(dim) +-----+-----------+-----------------------------------------+ | pos | idx | label | +-----+-----------+-----------------------------------------+ | 0 | 'POP' | 'population' | | 1 | 'PERCENT' | 'weight of age group in the population' | +-----+-----------+-----------------------------------------+ >>> json_string_dimension_sex = ''' ... { ... "label" : "sex", ... "category" : { ... "index" : { ... "M" : 0, ... "F" : 1, ... "T" : 2 ... }, ... "label" : { ... "M" : "men", ... "F" : "women", ... "T" : "total" ... } ... } ... } ... ''' >>> dim = JsonStatDimension(did="sex").from_string(json_string_dimension_sex) >>> len(dim) 3
-
__init__
(did=None, size=None, pos=None, role=None)[source]¶ initialize a dimension
Warning
this is an internal library function (it is not public api)
Parameters: - did – id of dimension
- size – size of dimension (nr of values)
- pos – position of dimension into the dataset
- role – of dimension
-
did
()¶ id of this dimension
-
label
()¶ label of this dimension
-
role
()¶ role of this dimension (can be time, geo or metric)
-
pos
()¶ position of this dimension with respect to the data set to which this dimension belongs
-
querying methods¶
parsing methods¶
JsonStatDimension.
from_string
(json_string)[source]¶parse a json string
Parameters: json_string – Returns: itself to chain calls
JsonStatDimension.
from_json
(json_data)[source]¶Parse a json structure representing a dimension
From json-stat.org
It is used to describe a particular dimension. The name of this object must be one of the strings in the id array. There must be one and only one dimension ID object for every dimension in the id array.jsonschema for dimension is about:
"dimension": { "type": "object", "properties": { "version": {"$ref": "#/definitions/version"}, "href": {"$ref": "#/definitions/href"}, "class": {"type": "string", "enum": ["dimension"]}, "label": {"type": "string"}, "category": {"$ref": "#/definitions/category"}, "note": {"type": "array"}, }, "additionalProperties": false },
Parameters: json_data – Returns: itself to chain call
Downloader helper¶
-
class
jsonstat.
Downloader
(cache_dir=u'./data', time_to_live=None)[source]¶ Helper class to download json stat files.
It has a very simple cache mechanism
-
download
(url, filename=None, time_to_live=None)[source]¶ Download url from internet.
Store the downloaded content into <cache_dir>/file. If <cache_dir>/file exists, it returns content from disk
Parameters: - url – page to be downloaded
- filename – filename where to store the content of url, None if we want not store
- time_to_live – how many seconds to store file on disk, None use default time_to_live, 0 don’t use cached version if any
Returns: the content of url (str type)
-
collection := {
[ "version" ":" `string` ]
[ "class" ":" "collection" ]
[ "href" ":" `url` ]
[ "updated": `date` ]
link : {
item : [
( dataset )+
]
}
dataset := {
"version" : <version>
"class" : "dataset",
"href" : <url>
"label" : <string>
"id" : [ <string>+] # ex. "id" : ["metric", "time", "geo", "sex"],
"size" : [ <int>, <int>, ... ]
"role" : roles of dimension
"value" : [<int>, <int> ]
"status" : status
"dimension" : { <dimension_id> : dimension, ...}
"link" :
}
dimension_id := <string>
# possible values of dimension are called categories
dimension := {
"label" : <string>
"class" : "dimension"
"category: {
"index" : dimension_index
"label" : dimension_label
"child" : dimension_child
"coordinates" :
"unit" : dimension_unit
}
}
dimension_index :=
{ <cat1>:int, ....} # { "2003" : 0, "2004" : 1, "2005" : 2, "2006" : 3 }
|
[ <cat1>, <cat2> ] # [ 2003, 2004 ]
- dimension_label :=
- { lbl1:idx1
Istat Module¶
This module contains helper class useful exploring the Italian Statistics Institute.
Utility Function¶
-
istat.
cache_dir
(cache_dir=None, time_to_live=None)[source]¶ Manage the directory
cached_dir
where to store downloaded fileswithout parameter get the directory with a parameter set the directory :param time_to_live: :param cache_dir:
-
istat.
areas
()[source]¶ returns a list of IstatArea objects representing all the area used to classify datasets
IstatArea¶
-
class
istat.
IstatArea
(istat_helper, iid, cod, desc)[source]¶ Represents a Area. An Area contains a list of dataset. Instances of this class are build only by Istat class
-
cod
¶ returns name of the area
-
dataset
(spec)[source]¶ get a instance of IstatDataset by spec :param spec: code of the dataset :return: IstatDataset instance
-
desc
¶ returns name of the area
-
iid
¶ returns the id of the area
-
IstatDataset¶
-
class
istat.
IstatDataset
(istat_helper, dataset)[source]¶ -
cod
¶ returns the code of this dataset
-
dimension
(spec)[source]¶ Get dimension according to spec
Parameters: spec – can be a int or a string Returns: an IstatDimension instance
-
getvalues
(spec, rtype=<class jsonstat.collection.JsonStatCollection>)[source]¶ get values by dimensions
Parameters: - spec – it is a string for ex. “1,6,9,0,0”
- type –
Returns: if type is JsonStatCollection return an istance of JsonStatCollection otherwise return a json structure representing the istat dataset
-
name
¶ returns the name of this dataset
-
jsonstat.py¶

jsonstat.py is a library for reading the JSON-stat data format maintained and promoted by Xavier Badosa. The JSON-stat format is a JSON format for publishing dataset. JSON-stat is used by several institutions to publish statistical data. An incomplete list is:
- Eurostat that provide statistical information about the European Union (EU)
- Italian National Institute of Statistics Istat
- Central Statistics Office of Ireland
- United Nations Economic Commission for Europe (UNECE) statistical data are here
- Statistics Norway
- UK Office for national statistics see their blog post
- others...
jsonstat.py library tries to mimic as much is possible in python the json-stat Javascript Toolkit. One of the library objectives is to be helpful in exploring dataset using jupyter (ipython) notebooks.
For a fast overview of the feature you can start from this example notebook oecd-canada-jsonstat_v1.html You can also check out some of the jupyter example notebook from the example directory on github or from the documentation
As bonus jsonstat.py contains an useful classes to explore dataset published by Istat.
You can find useful another python library pyjstat by Miguel Expósito Martín concerning json-stat format.
This library is in beta status. I am actively working on it and hope to improve this project. For every comment feel free to contact me gf@26fe.com
You can find source at github , where you can open a ticket, if you wish.
You can find the generated documentation at readthedocs.
Usage¶
Simple Usage¶
There is a simple command line interface, so you can experiment to parse jsonstat file without write code:
# parsing collection
$ jsonstat info --cache_dir /tmp http://json-stat.org/samples/oecd-canada.json
downloaded file(s) are stored into '/tmp'
download 'http://json-stat.org/samples/oecd-canada.json'
Jsonsta tCollection contains the following JsonStatDataSet:
+-----+----------+
| pos | dataset |
+-----+----------+
| 0 | 'oecd' |
| 1 | 'canada' |
+-----+----------+
# parsing dataset
$ jsonstat info --cache_dir /tmp "http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?sex=T&precision=1&age=TOTAL&s_adj=NSA"
downloaded file(s) are stored into '/tmp'
download 'http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?sex=T&precision=1&age=TOTAL&s_adj=NSA'
name: 'Unemployment rate'
label: 'Unemployment rate'
size: 467
+-----+-------+-------+------+------+
| pos | id | label | size | role |
+-----+-------+-------+------+------+
| 0 | s_adj | s_adj | 1 | |
| 1 | age | age | 1 | |
| 2 | sex | sex | 1 | |
| 3 | geo | geo | 39 | |
| 4 | time | time | 12 | |
+-----+-------+-------+------+------+
code example:
url = 'http://json-stat.org/samples/oecd-canada.json'
collection = jsonstat.from_url(url)
# print list of dataset contained into the collection
print(collection)
# select the first dataset of the collection and print a short description
oecd = collection.dataset(0)
print(oecd)
# print description about each dimension of the dataset
for d in oecd.dimensions():
print(d)
# print a datapoint contained into the dataset
print(oecd.value(area='IT', year='2012'))
# convert a dataset in pandas dataframe
df = oecd.to_data_frame('year')
For more python script examples see examples directory.
For jupyter (ipython) notebooks see examples-notebooks directory.
Support¶
This is an open source project, maintained in my spare time. Maybe a particular features or functions that you would like are missing. But things don’t have to stay that way: you can contribute the project development yourself. Or notify me and ask to implement it.
Bug reports and feature requests should be submitted using the github issue tracker. Please provide a full traceback of any error you see and if possible a sample file. If you are unable to make a file publicly available then contact me at gf@26fe.com.
You can find support also on the google group.
How to Contribute Code¶
Any help will be greatly appreciated, just follow those steps:
- Fork it. Start a new fork for each independent feature, don’t try to fix all problems at the same time, it’s easier for those who will review and merge your changes.
- Create your feature branch (
git checkout -b my-new-feature
) - Write your code. Add unit tests for your changes!
If you added a whole new feature, or just improved something, you can be proud of it,
so add yourself to the
AUTHORS
file :-) Update the docs! - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request. Click on the large “pull request” button on your repository. Wait for your code to be reviewed, and, if you followed all theses steps, merged into the main repository.
License¶
jsonstat.py is provided under the LGPL license. See LICENSE file.