Regionalization in Brightway2

bw2regional is a separate library that extends the Brightway LCA framework to do regionalized LCA calculations.

_images/mnglr.png

It is easy to do regionalized LCA incorrectly. This package tries to make it at least a bit easier to do regionalization correctly, for at least some definitions of correct.

bw2regional supports the following regionalization calculations:

  1. Inventory database and impact assessment method share the same spatial scale (Shared spatial scale)
  2. Inventory and impact assessment have different spatial scales (Two spatial scales)
  3. Inventory and impact assessment have different spatial scales, with background loading used for spatial allocation (Two spatial scales with generic loading)
  4. Inventory and impact assessment have different spatial scales, with extension tables used as a third spatial scale to allocate impact assessment units to inventory units.

In addition to making regionalized LCA calculations, maps of regionalized impact can be exported from methods 2-4.

Each separate spatial scale is stored as a geocollection. The relationships between spatial scales (i.e. how much area of unit a in spatial scale 1 intersects unit b in spatial scale 2) is stored as an Intersection. Areal intersection calculations are done using the separate utility pandarus, a library for matching spatial data sets and calculating their mutual intersected areas. Impact assessment methods store characterization factors per biosphere flow and spatial unit. Each organization unit is described in more detail below.

bw2regional is part of a family of software libraries - see Regionalization libraries for more information.

Lifecycle of a regionalized LCA calculation

A new project is created, and bw2setup() is run. Background databases are imported, and some foreground data in added. bw2regionalsetup() is run (see Base data needed for regionalized calculations for more information), and the function fix_ecoinvent_database() is run for each inventory database which uses ecoinvent-specific location codes.

A regionalized LCIA method is chosen. One or more geocollections are created for the LCIA method, including specifying filepaths to the raster or vector datasets which define these geocollections and characterization factors. Regionalized characterization factors are imported from the spatial datasets using the function import_regionalized_cfs().

Most of the time, a third spatial scale is created, either a Loading or ExtensionTable. This third scale defines activity intensities within the larger inventory locations. Another geocollection is created for this third spatial scale, which includes specifying another spatial data file.

A functional unit for analysis is chosen, and the function check_needed_intersections() is called to get all necessary Intersection datasets. Most of the time, some intersections will need to be calculated, and the method remote.get_needed_intersections() can be called to do all necessary GIS calculations remotely. This step may take some time.

Finally, an LCA object is created. Depending on the study, and whether a third spatial scale was added, this will be an instance of OneSpatialScaleLCA, TwoSpatialScalesLCA, TwoSpatialScalesWithGenericLoadingLCA, or ExtensionTablesLCA. Regionalized results can be summarized numerically, or exported to maps using methods like write_results_to_ia_map(), write_results_to_inv_map(), and write_results_to_xtable_map().

Project setup

Please run the utility function bw2regionalsetup() in each new project that will do regionalized LCA calculations.

Any version of ecoinvent will need to be processed with the utility function fix_ecoinvent_database("some ecoinvent database name"). This function will do the following:

  • Convert any Rest-of-World locations into actual world regions and subsequently relabel them.
  • Relabel any ecoinvent-specific locations names to place them inside the ecoinvent geocollection.

Spatial scales (geocollections)

A geocollection is a container for a set of locations. For example, the world geocollection contains a political version of the world, while other geocollections could list the watersheds or ecoregions of the world. A geocollection can be as simple as just a name, but if you want to be able to export maps or import characterization factors, the following fields should be provided:

  • filepath: The absolute filepath to the vector or raster dataset.
  • field: The unique field that identifies each feature, e.g. name for countries.
  • layer: The name of the layer. Only needed for vector datasets with more than one layer.
  • encoding: The text encoding. Only needed for shapefiles.
  • vfs: For shapefiles stored in zip archives, this is the virtual file system string, e.g. vfs="zip:///path/to/file/ne_50m_admin_0_countries.zip". Note that with a vfs, you must spcify the filepath within the zip archive, e.g. filepath="/ne_50m_admin_0_countries.shp". See also the fiona manual.

Example creation of a geocollection:

from bw2regional import *
geocollections['water cfs'] = {'filepath': '/my/favorite/directory/some-raster-dataset.tiff'}

Default geocollections

The setup function bw2regionalsetup creates the following geocollections:

  • world: All countries in the world, as well as the global location GLO, which has no spatial definition.
  • ecoinvent: Ecoinvent-specific locations, including UN regions and subregions, composite geographies, and other ecoinvent-specific locations. See the ecoinvent geography report for more information.

Topocollections

For performance reasons, calculations done using large-scale and overlapping geocollections can be split into many smaller topological faces. Brightway2-regional will perform such splitting automatically for calculations using most LCI databases. See the technical documentation for more details.

Intersections

An Intersection is a dataset that shows the areal intersections between the spatial units of two geocollections. For example, if the world geocollection had the location “Netherlands”, and the watersheds geocollection had the location “Rhine watershed”, then the intersection (“world”, “watersheds”) would include the line ((“world”, “Netherlands”), (“watersheds”, “Rhine watershed”)): 423.7, though this number is made up. Areal intersections can be calculated using whatever GIS software and projection that you want, but brightway2-regional is designed to work seamlessly with pandarus, which can quickly and painlessly do these areal intersection calculations and write data in a data format brightway2-regional understands.

An Intersection object does not have its own name, but rather is defined by the two geocollections which are intersected.

Example use of pandarus (see also the pandarus documentation):

pandarus /Users/somebody/some-raster.tiff /Users/somebody/some-shapefile.shp --field2=name foo.bar

This would create the file foo.bar.json.bz2.

Example import of pandarus output data:

Note

You should create the geocollections first before creating an Intersection between them.

from bw2regional import *
geocollections['some-raster'] = {'filepath': u'/Users/somebody/some-raster.tiff'}
geocollections['some-shapefile'] = {
                                    'filepath': u'/Users/somebody/some-shapefile.shp',
                                    'field': 'name'
                                    }
Intersection(('some-raster', 'some-shapefile')).import_from_pandarus('foo.bar.json.bz2')

Contents:

Data and data formats

Geocollections

Geocollections are containers that organize and describe sets of spatial data identifiers, and possibly other data, including spatial supports. They are similar to the concept of Databases in Brightway2: an inventory dataset could be identified by ("My new database", "Dataset 14"), while spatial unit could have a similar id: ("My new raster", "Cell 42, 11"). However, only metadata is stored for each geocollection, so there is no Geocollection object, only geocollections.

Geocollections are used by both inventory data sets (e.g. for custom locations) and impact assessment methods.

Geocollections can refer to vector or raster data. For example, one geocollection could be the set of world countries, as described by the Natural Earth data, while another could be the raster cells used in a particular impact assessment method.

Geocollections have two purposes in Brightway2:

  • They provide a conceptual and physical grouping of spatial data into manageable units
  • They provide a way to uniquely identify spatial data

A geocollection is not a geodatabase - no spatial data is required. However, if the original spatial data is available (as a vector or raster file), it can be specified and used later in analysis.

A geocollection is specified by a unique string, which is usually the name of the geocollection.

Metadata fields

There are no required metadata fields for geocollections, but some common fields are useful when the original data is available. All the below listed metadata field values should be strings:

  • filepath: Filepath for the vector or raster file
  • layer: Layer name (vector data only)
  • field: Field name that uniquely identifies each feature (vector data only)
  • vfs: Virtual file system used to load zipped shapefiles
  • encoding: Text encoding

Note

It is strongly preferred that ESRI shapefile be zipped to a single file, with appropriate metadata. Ideally, each Geocollection can have its associated spatial data in a single file. For example, the Natural Earth 50m political data is specified as: vfs="zip:///Users/cmutel/Downloads/Geodata/ne_50m_admin_0_countries.zip", filepath="/ne_50m_admin_0_countries.shp. See the Fiona manual for more.

Standard geocollections

The following are some standard geocollections, and are installed automatically.

  • global: Only the global location “GLO”
  • countries: All the countries in the world, as defined by ISO, and identified by the ISO 2-letter codes.
  • regions: UN regions and subregions
  • ecoinvent 2: Special regions defined by ecoinvent in version 2 of the database
  • ecoinvent 3: Special regions defined by ecoinvent in version 3 of the database

Regionalized impact assessment

Regionalized impact assessment methods have characterization factors that depend both on biosphere flows and spatial units. Characterization factors can be static or uncertain.

In Brightway2, the Method object can store site-generic, regionalized, or both site-generic and regionalized characterization factors. Similarly, the methods metadata store works the same for site-generic and regionalized IA methods.

Specifying spatial objects

Following the Brightway2 principle of KISS, spatial objects have a simple format with reasonable defaults:

  1. If not location data is provided, the global location “GLO” is assumed
  2. A two-letter ISO county code can be used
  3. Otherwise, spatial objects should be identified by the combination of geocollection and id, just like objects in Databases: (geocollection, spatial unit id).

Metadata

In addition to the standard metadata for IA methods, such as unit and description, regionalized IA methods should include the following for complete functionality:

  • geocollections: List of one or more strings identifying the geocollections. Normally only one geocollection is associated with a regionalized IA method. See Geocollections.
  • band: Band number in original raster data set. Needed to import characterization factors.
  • cf_field: Field name of characterization factor value. Needed to import characterization factors.

Data format

Site-generic IA methods have a simple data format:

[
    [biosphere flow, maybe uncertainty],
]

Where maybe uncertainty is either a floating point number (implying no uncertainty), or a stats_array uncertainty dictionary, like:

{'loc': 2, 'scale': 0.5, 'uncertainty_type': NormalUncertainty.id}

Note

In site-generic CFs, where a location is not given, the “GLO” location is assumed.

Regionalized IA methods are almost the same; they just have an additional field for a location specifier.

[
    [biosphere flow, maybe uncertainty, spatial object],
]

Loadings

Background loadings are data on the spatial patterns of emission, and are used to determine the relatively likelihood that a given inventory dataset occurs in a impact assessment spatial unit. The idea is that the existing patterns of emissions are reasonable predictors of where present or future emissions will occur.

In Brightway2, background loadings are represented by the Loading objects, and metadata about all loadings is stored in loadings.

Because loadings are a density of predicted activity, their unit is physical quantity (e.g. mass or energy) per unit area. Loadings are multiplied by intersected areas, and then normalized by total loading, so their units are canceled out in the end.

As loadings are emission-specific, this could mean that different loadings for different biosphere flows could predict different spatial patterns of inventory activity. There is no real research on the importance of this inconsistency.

Background loadings are, in general, supplied by the impact assessment method developers. If no loadings are supplied, the generic fallback is to allocate impact assessment spatial units to inventory spatial unit by intersected area; however, proxy loadings can be independently calculated. See this example ipython notebook for one such procedure.

Background loadings should always use the same geocollection as their IA methods.

Metadata

Their are no required fields for background loadings, as most metadata comes from the IA method. Fields such as description can be used.

Data format

[
    [maybe_uncertainty, location_id],
]

Warning

Industry sector-specific loadings and emission-specific loadings are not yet supported in bw2regional.

Intersections

Data for the geographic transform matrix G is stored in Intersection objects. In Brightway2, areal intersection data is represented by the Intersection object, and the metadata store is intersections. Each combination of geocollections should be a separate Intersection object.

Intersection data is calculated using pandarus. Conversion from the pandarus data format is done with the utility function Load Pandarus output.

Metadata

No metadata is required for intersections.

The pandarus-filepath field is the filepath of the Pandarus file, and is set automatically when imported.

Data format

We assume that data is written automatically after conversion from pandarus, so users shouldn’t be writing or manipulating intersection data themselves. Nevertheless, here is the data format:

[
    [spatial object 1, spatial object 2, intersection area],
]

Extension tables

Metadata

In addition to the standard metadata for IA methods, such as unit and description, regionalized IA methods should include the following for complete functionality:

  • geocollection: String identifying a geocollection. See Geocollections.
  • xt_field: Field name used for extension table values. Only needed for vector spatial data.
  • band: Raster band index for extension table values.

Data format

[
    [float, spatial object 1]
]

Understanding Regionalized LCA

Two Spatial Scales with Generic Loading

We start with a relatively complex example of regionalized LCA - the inventory database and impact assessment method have different spatial scales, and we have background loading data which is generic to all biosphere flows. In this case, we have the following formula:

\[h_{r} = \left[ \textbf{MNGLR} \right]^{T} \circ [ \textbf{B} \cdot (\textbf{A}^{-1}f) ]\]

Let’s start with R, the regionalized characterization matrix. R provides characterization factors for each biosphere flow and each impact assessment spatial unit. The spatial scale for impact assessment depends on the impact category, but it generally something like watersheds or ecoregions.

R has rows of different impact assessment spatial units, and columns of biosphere flows. In our example, we are using made up numbers, so we can make up R as follows:

\[\begin{split}\textbf{R} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}\end{split}\]

So, in the first row (perhaps the Danube watershed) and the first column (perhaps emission of elemental chlorine), there is a characterization factor of 1. The units in R are units of damage, either at the mid- or endpoint.

We now start estimating where our inventory activities occur. Of course, we know where they occur, at least in our version of the world - each inventory dataset should have a location, and the mapping matrix M indicates which inventory dataset is associated with inventory location. The number of inventory locations is always equal to or fewer than the number of inventory datasets, as any additional places on earth which we don’t have inventory datasets for don’t exist in our model of the world. M has rows of inventory activities (e.g. make steel), and columns of inventory spatial units (e.g. Georgia). If the activity occurs in the given location, there is a one in M, and otherwise a zero. M is unitless.

Here is our example mapping matrix:

\[\begin{split}\textbf{M} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

This M tells us that the activity in the first row, whatever that is, happens in the first inventory location, wherever that is.

So we know where our inventory activities are in the spatial scale of the inventory, but not in the spatial scale of the impact assessment. To apply the characterization factors in the spatial scale of the impact assessment to the calculated inventory, we need to match the two scales.

We can used a GIS, or more simply, the pandarus utility, to calculate the intersected areas of each inventory spatial unit with each impact assessment spatial unit. This is the first step in figuring out where activities occur in the spatial scale of the inventory - we could allocate based on intersected area. However, area is a pretty poor proxy for where industrial activities actually occur, so we will need additional information.

The matrix G gives us the intersected areas for the two spatial scales. For example, it could tell us that the Rhone watershed has a large amount of area in the borders of France, but none in Brazil. G has rows of inventory spatial units, like countries or political unions, and columns or impact assessment spatial units, like watersheds or ecoregions. G has units of area, by convention square meters in brightway2 and pandarus.

Our example G is:

\[\begin{split}\textbf{G} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 0 & 5 & 8 \\ 0 & 0 & 13 \end{bmatrix}\end{split}\]

In this example matrix, …

It now gets a bit tricky to think about, because we are going to use environmental data to estimate industrial activity, and we mostly go the other direction in LCA. We want to know how likely it is that industrial activity, or more generally, activity which emits or consumes resources, occurs in each impact assessment region. We can’t just use the area of the impact assessment region, because the world’s economic activity is not uniformly distributed in space. There are more activities in the Rhein watershed than the Lena river watershed, for example. We can use databases of existing environmental loadings to estimate where activities and their emissions are taking place.

The background loading matrix L represents our best knowledge about where inventory activities happen, based on where these activities are happening now and the amount that is currently emitted to the environment. L is diagonal - it has values on the diagonal and zeros elsewhere.

Where the normalization matrix N is defined by:

\[\textbf{N}_{i,i} = \left[ \sum_{j} \left( \textbf{GL} \right)_{i,j} \right]^{-1}\]

Common tasks with regionalized data

Match geocollections using Pandarus

Convert geocollection to Method data

Note

Conversion requires access to the actual spatial data and all needed metadata, i.e. filepaths, layer and field name for shapefiles, and raster band for rasters.

Convert geocollection to Loading data

Load Pandarus output

To load the output from a Pandarus calculation, use Pandarus.import_from_pandarus (insert ref. to method docs here) like:

Intersection(("from geocollection name", "to geocollection name")).import_from_pandarus("")

Note

Be sure to pass the --lca if calling Pandarus from the command line.

Warning

Make sure the order of the two geocollections is the same in the Intersection object as it was in the Pandarus calculation.

Base data needed for regionalized calculations

Defining countries

Next, we create a geocollection "world" and topocollection "world". It is quite useful to label countries by their ISO 3166-1 alpha-2 country codes (e.g. “DE” for Germany), so we don’t require these locations to be given a complex location key like ("world", "DE"); instead, we will treat any two letter country code as if it came from the "world" geocollection.

Defining ecoinvent-specific locations

Ecoinvent defines a large number of additional locations, like “Europe” or “Canada without Alberta and Quebec”. See the constructive geometries source repository for more information on ecoinvent-specific locations. These locations are handled in a third new geocollection, "ecoinvent", as well as the "ecoinvent" topocollection.

Topographies

Extension tables

Summary

bw2regionalsetup() create the following:

Geocollections

  • world
  • ecoinvent
  • RoW
  • gdp-weighted-pop-density

Topocollections

  • world
  • ecoinvent
  • RoW
  • gdp-weighted-pop-density

Topographies

  • world
  • ecoinvent
  • RoW

Extension tables

  • gdp-weighted-pop-density

Intersections

The following intersections are only created if the default pandarus_remote server, https://pandarus.brightwaylca.org, is available.

  • (‘world’, ‘geo-weighted-pop-density’)
  • (‘ecoinvent’, ‘geo-weighted-pop-density’)
  • (‘RoW’, ‘geo-weighted-pop-density’)

Regionalization libraries

Constructive Geometries

This repository contains the scripts and data needed to build a consistent topology of the world (provinces, countries, and states), needed for the ecoinvent life cycle inventory database. It also includes the ability to define recipes to generate custom locations.

Py-Constructive-Geometries

Brightway2-regional uses the py-constructive-geometries library, which includes a topographical map of the world, as well as a few functions for manipulating topographical geometries.

constructive_geometries.ConstructiveGeometries

Pandarus

Pandarus is software for taking two geospatial data sets (either raster or vector), and efficiently calculating their combined intersected areas. Brightway2-regional is designed to import the calculation results from Pandarus. See the source code repository for more information.

Pandarus Remote

Pandarus remote is a web service for processing and managing data for regionalized life cycle assessment using Pandarus. Many large GIS calculations are better done on servers with enough resources to handle everything in memory. See the source code repository for API endpoints and installation instructions.

Technical reference

Regionalization base class

Two spatial scales with generic loading

Two spatial scales

Shared spatial scale

Extension tables

Topographies

Development

bw2regional is developed by Chris Mutel, previously during his work as a postdoctoral assistant in the Ecological Systems Design group at ETH Zürich, and currently as a scientist in the Technology Assessment group at the Paul Scherrer Institute.

Source code is available on bitbucket.

Indices and tables