PySAL¶
PySAL is an open source library of spatial analysis functions written in Python intended to support the development of high level applications. PySAL is open source under the BSD License.
User Guide¶
Introduction¶
History¶
PySAL grew out of a collaborative effort between Luc Anselin’s group previously located at the University of Illinois, ChampaignUrbana, and Serge Rey who was at San Diego State University. It was born out of a recognition that the respective projects at the two institutions, PySpace (now GeoDaSpace) and STARS  Space Time Analysis of Regional Systems, could benefit from a shared analytical core, since this would limit code duplication and free up additional developer time to focus on enhancements of the respective applications.
This recognition also came at a time when Python was starting to make major inroads in geographic information systems as represented by projects such as the Python Cartographic Library, Shapely and ESRI’s adoption of Python as a scripting language, among others. At the same time there was a dearth of Python modules for spatial statistics, spatial econometrics, location modeling and other areas of spatial analysis, and the role for PySAL was then expanded beyond its support of STARS and GeoDaSpace to provide a library of core spatial analytical functions that could support the next generation of spatial analysis applications.
In 2008 the home for PySAL moved to the GeoDa Center for Geospatial Analysis and Computation at Arizona State University.
Scope¶
It is important to underscore what PySAL is, and is not, designed to do. First and foremost, PySAL is a library in the fullest sense of the word. Developers looking for a suite of spatial analytical methods that they can incorporate into application development should feel at home using PySAL. Spatial analysts who may be carrying out research projects requiring customized scripting, extensive simulation analysis, or those seeking to advance the state of the art in spatial analysis should also find PySAL to be a useful foundation for their work.
End users looking for a user friendly graphical user interface for spatial analysis should not turn to PySAL directly. Instead, we would direct them to projects like STARS and the GeoDaX suite of software products which wrap PySAL functionality in GUIs. At the same time, we expect that with developments such as the Python based plugin architectures for QGIS, GRASS, and the toolbox extensions for ArcGIS, that end user access to PySAL functionality will be widening in the near future.
Research Papers and Presentations¶
 Rey, Sergio J. (2012) PySAL: A Python Library for Exploratory Spatial Data Analysis and Geocomputation (Movie) SciPy 2012.
 Rey, Sergio J. and Luc Anselin. (2010) PySAL: A Python Library of Spatial Analytical Methods. In M. Fischer and A. Getis (eds.) Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications. Springer, Berlin.
 Rey, Sergio J. and Luc Anselin. (2009) PySAL: A Python Library for Spatial Analysis and Geocomputation. (Movie) Python for Scientific Computing. Caltech, Pasadena, CA August 2009.
 Rey, Sergio J. (2009). Show Me the Code: Spatial Analysis and Open Source. Journal of Geographical Systems 11: 1912007.
 Rey, S.J., Anselin, L., & M. Hwang. (2008). Dynamic Manipulation of Spatial Weights Using Web Services. GeoDa Center Working Paper 200812.
Install PySAL¶
Windows users can download an .exe installer here on Sourceforge.
PySAL is built upon the Python scientific stack including numpy and scipy. While these libraries are packaged for several platforms, the Anaconda and Enthought Python distributions include them along with the core Python library.
Note that while both Anaconda and Enthought Canopy will satisfy the dependencies for PySAL, the version of PySAL included in these distributions might be behind the latest stable release of PySAL. You can update to the latest stable version of PySAL with either of these distributions as follows:
 In a terminal start the python version associated with the distribution. Make sure you are not using a different (system) version of Python. To check this use which python from a terminal to see if Anaconda or Enthought appear in the output.
 pip install U pysal
If you do not wish to use either Anaconda or Enthought, ensure the following software packages are available on your machine:
Getting your feet wet¶
You can start using PySAL right away on the web with Wakari, PythonAnywhere, or SageMathCloud.
wakari http://continuum.io/wakari
PythonAnywhere https://www.pythonanywhere.com/
SageMathCloud https://cloud.sagemath.com/
Download and install¶
PySAL is available on the Python Package Index, which means it can be downloaded and installed manually or from the command line using pip, as follows:
pip install pysal
Alternatively, grab the source distribution (.tar.gz) and decompress it to your selected destination. Open a command shell and navigate to the decompressed pysal folder. Type:
pip install .
Development version on GitHub¶
Developers can checkout PySAL using git:
git clone https://github.com/pysal/pysal.git
Open a command shell and navigate to the cloned pysal directory. Type:
pip install e .[dev]
The ‘e’ builds the modules in place and symlinks from the python site packages directory to the pysal folder. The advantage of this method is that you get the latest code but don’t have to fuss with editing system environment variables.
To test your setup, start a Python session and type:
>>> import pysal
Keep up to date with pysal development by ‘pulling’ the latest changes:
git pull
Windows¶
To keep up to date with PySAL development, you will need a Git client that allows you to access and update the code from our repository. We recommend GitHub Windows for a more graphical client, or Git Bash for a command line client. This one gives you a nice Unixlike shell with familiar commands. Here is a nice tutorial on getting going with Open Source software on Windows.
After cloning pysal, install it in develop mode so Python knows where to find it.
Open a command shell and navigate to the cloned pysal directory. Type:
pip install e .[dev]
To test your setup, start a Python session and type:
>>> import pysal
Keep up to date with pysal development by ‘pulling’ the latest changes:
git pull
Troubleshooting¶
If you experience problems when building, installing, or testing pysal, ask for help on the OpenSpace list or browse the archives of the pysaldev google group.
Please include the output of the following commands in your message:
Platform information:
python c 'import os,sys;print os.name, sys.platform' uname a
Python version:
python c 'import sys; print sys.version'
SciPy version:
python c 'import scipy; print scipy.__version__'
NumPy version:
python c 'import numpy; print numpy.__version__'
Feel free to add any other relevant information. For example, the full output (both stdout and stderr) of the pysal installation command can be very helpful. Since this output can be rather large, ask before sending it into the mailing list (or better yet, to one of the developers, if asked).
Getting Started with PySAL¶
Introduction to the Tutorials¶
Assumptions¶
The tutorials presented here are designed to illustrate a selection of the functionality in PySAL. Further details on PySAL functionality not covered in these tutorials can be found in the API. The reader is assumed to have working knowledge of the particular spatial analytical methods illustrated. Background on spatial analysis can be found in the references cited in the tutorials.
It is also assumed that the reader has already installed PySAL.
Examples¶
The examples use several sample data sets that are included in the pysal/examples directory. In the examples that follow, we refer to those using the path:
../pysal/examples/filename_of_example
You may need to adjust this path to match the location of the sample files on your system.
Getting Help¶
Help for PySAL is available from a number of sources.
email lists¶
The main channel for user support is the openspace mailing list.
Questions regarding the development of PySAL should be directed to pysaldev.
Documentation¶
Documentation is available online at pysal.org.
You can also obtain help at the interpreter:
>>> import pysal
>>> help(pysal)
which would bring up help on PySAL:
Help on package pysal:
NAME
pysal
FILE
/Users/serge/Dropbox/pysal/src/trunk/pysal/__init__.py
DESCRIPTION
Python Spatial Analysis Library
===============================
Documentation

PySAL documentation is available in two forms: python docstrings and a html webpage at http://pysal.org/
Available subpackages

cg
:
Note that you can use this on any option within PySAL:
>>> w=pysal.lat2W()
>>> help(w)
which brings up:
Help on W in module pysal.weights object:
class W(__builtin__.object)
 Spatial weights

 Parameters
 
 neighbors : dictionary
 key is region ID, value is a list of neighbor IDS
 Example: {'a':['b'],'b':['a','c'],'c':['b']}
 weights = None : dictionary
 key is region ID, value is a list of edge weights
 If not supplied all edge wegiths are assumed to have a weight of 1.
 Example: {'a':[0.5],'b':[0.5,1.5],'c':[1.5]}
 id_order = None : list
 An ordered list of ids, defines the order of
 observations when iterating over W if not set,
 lexicographical ordering is used to iterate and the
 id_order_set property will return False. This can be
 set after creation by setting the 'id_order' property.

Note that the help is truncated at the bottom of the terminal window and more of the contents can be seen by scrolling (hit any key).
An Overview of the FileIO system in PySAL.¶
Contents
 An Overview of the FileIO system in PySAL.
 Introduction
 Examples: Reading files
 Shapefiles
 DBF Files
 CSV Files
 WKT Files
 GeoDa Text Files
 GAL Binary Weights Files
 GWT Weights Files
 ArcGIS Text Weights Files
 ArcGIS DBF Weights Files
 ArcGIS SWM Weights Files
 DAT Weights Files
 MATLAB MAT Weights Files
 LOTUS WK1 Weights Files
 GeoBUGS Text Weights Files
 STATA Text Weights Files
 MatrixMarket MTX Weights Files
 Examples: Writing files
 Examples: Converting the format of spatial weights files
 Alternative Tabular API
Introduction¶
PySAL contains a file inputoutput API that stands as a reference pure python implementation for spatial IO. The design goal for this API is to abstract file handling and return native PySAL data types when reading from known file types. A list of known extensions can be found by issuing the following command:
pysal.open.check()
Note that in some cases the FileIO module will peek inside your file to determine its type. For example “geoda_txt” is just a unique scheme for ”.txt” files, so when opening a ”.txt” pysal will peek inside the file to determine it if has the necessary header information and dispatch accordingly. In the event that pysal does not understand your file IO operations will be dispatched to python’s internal open.
PySAL can also fully leverage Geopandas in analyses. It also provides an alternative, tabular data IO system, pdio
.
Examples: Reading files¶
Shapefiles¶
>>> import pysal
>>> shp = pysal.open('../pysal/examples/10740.shp')
>>> poly = shp.next()
>>> type(poly)
<class 'pysal.cg.shapes.Polygon'>
>>> len(shp)
195
>>> shp.get(len(shp)1).id
195
>>> polys = list(shp)
>>> len(polys)
195
DBF Files¶
>>> import pysal
>>> db = pysal.open('../pysal/examples/10740.dbf','r')
>>> db.header
['GIST_ID', 'FIPSSTCO', 'TRT2000', 'STFID', 'TRACTID']
>>> db.field_spec
[('N', 8, 0), ('C', 5, 0), ('C', 6, 0), ('C', 11, 0), ('C', 10, 0)]
>>> db.next()
[1, '35001', '000107', '35001000107', '1.07']
>>> db[0]
[[1, '35001', '000107', '35001000107', '1.07']]
>>> db[0:3]
[[1, '35001', '000107', '35001000107', '1.07'], [2, '35001', '000108', '35001000108', '1.08'], [3, '35001', '000109', '35001000109', '1.09']]
>>> db[0:5,1]
['35001', '35001', '35001', '35001', '35001']
>>> db[0:5,0:2]
[[1, '35001'], [2, '35001'], [3, '35001'], [4, '35001'], [5, '35001']]
>>> db[1,1]
['9712']
CSV Files¶
>>> import pysal
>>> db = pysal.open('../pysal/examples/stl_hom.csv')
>>> db.header
['WKT', 'NAME', 'STATE_NAME', 'STATE_FIPS', 'CNTY_FIPS', 'FIPS', 'FIPSNO', 'HR7984', 'HR8488', 'HR8893', 'HC7984', 'HC8488', 'HC8893', 'PO7984', 'PO8488', 'PO8893', 'PE77', 'PE82', 'PE87', 'RDAC80', 'RDAC85', 'RDAC90']
>>> db[0]
[['POLYGON ((89.585220336914062 39.978794097900391,89.581146240234375 40.094867706298828,89.603988647460938 40.095306396484375,89.60589599609375 40.136119842529297,89.6103515625 40.3251953125,89.269027709960938 40.329566955566406,89.268562316894531 40.285579681396484,89.154655456542969 40.285774230957031,89.152763366699219 40.054969787597656,89.151618957519531 39.919403076171875,89.224777221679688 39.918678283691406,89.411857604980469 39.918041229248047,89.412437438964844 39.931644439697266,89.495201110839844 39.933486938476562,89.4927978515625 39.980186462402344,89.585220336914062 39.978794097900391))', 'Logan', 'Illinois', 17, 107, 17107, 17107, 2.115428, 1.290722, 1.624458, 4, 2, 3, 189087, 154952, 184677, 5.10432, 6.59578, 5.832951, 0.991256, 0.940265, 0.845005]]
>>> fromWKT = pysal.core.util.wkt.WKTParser()
>>> db.cast('WKT',fromWKT)
>>> type(db[0][0][0])
<class 'pysal.cg.shapes.Polygon'>
>>> db[0][0][1:]
['Logan', 'Illinois', 17, 107, 17107, 17107, 2.115428, 1.290722, 1.624458, 4, 2, 3, 189087, 154952, 184677, 5.10432, 6.59578, 5.832951, 0.991256, 0.940265, 0.845005]
>>> polys = db.by_col('WKT')
>>> from pysal.cg import standalone
>>> standalone.get_bounding_box(polys)[:]
[92.70067596435547, 36.88180923461914, 87.91657257080078, 40.329566955566406]
WKT Files¶
>>> import pysal
>>> wkt = pysal.open('../pysal/examples/stl_hom.wkt', 'r')
>>> polys = wkt.read()
>>> wkt.close()
>>> print len(polys)
78
>>> print polys[1].centroid
(91.19578469430738, 39.990883050220845)
GeoDa Text Files¶
>>> import pysal
>>> geoda_txt = pysal.open('../pysal/examples/stl_hom.txt', 'r')
>>> geoda_txt.header
['FIPSNO', 'HR8488', 'HR8893', 'HC8488']
>>> print len(geoda_txt)
78
>>> geoda_txt.dat[0]
['17107', '1.290722', '1.624458', '2']
>>> geoda_txt._spec
[<type 'int'>, <type 'float'>, <type 'float'>, <type 'int'>]
>>> geoda_txt.close()
GAL Binary Weights Files¶
>>> import pysal
>>> gal = pysal.open('../pysal/examples/sids2.gal','r')
>>> w = gal.read()
>>> gal.close()
>>> w.n
100
GWT Weights Files¶
>>> import pysal
>>> gwt = pysal.open('../pysal/examples/juvenile.gwt', 'r')
>>> w = gwt.read()
>>> gwt.close()
>>> w.n
168
ArcGIS Text Weights Files¶
>>> import pysal
>>> arcgis_txt = pysal.open('../pysal/examples/arcgis_txt.txt','r','arcgis_text')
>>> w = arcgis_txt.read()
>>> arcgis_txt.close()
>>> w.n
3
ArcGIS DBF Weights Files¶
>>> import pysal
>>> arcgis_dbf = pysal.open('../pysal/examples/arcgis_ohio.dbf','r','arcgis_dbf')
>>> w = arcgis_dbf.read()
>>> arcgis_dbf.close()
>>> w.n
88
ArcGIS SWM Weights Files¶
>>> import pysal
>>> arcgis_swm = pysal.open('../pysal/examples/ohio.swm','r')
>>> w = arcgis_swm.read()
>>> arcgis_swm.close()
>>> w.n
88
DAT Weights Files¶
>>> import pysal
>>> dat = pysal.open('../pysal/examples/wmat.dat','r')
>>> w = dat.read()
>>> dat.close()
>>> w.n
49
MATLAB MAT Weights Files¶
>>> import pysal
>>> mat = pysal.open('../pysal/examples/spatsymus.mat','r')
>>> w = mat.read()
>>> mat.close()
>>> w.n
46
LOTUS WK1 Weights Files¶
>>> import pysal
>>> wk1 = pysal.open('../pysal/examples/spatsymus.wk1','r')
>>> w = wk1.read()
>>> wk1.close()
>>> w.n
46
GeoBUGS Text Weights Files¶
>>> import pysal
>>> geobugs_txt = pysal.open('../pysal/examples/geobugs_scot','r','geobugs_text')
>>> w = geobugs_txt.read()
WARNING: there are 3 disconnected observations
Island ids: [6, 8, 11]
>>> geobugs_txt.close()
>>> w.n
56
STATA Text Weights Files¶
>>> import pysal
>>> stata_txt = pysal.open('../pysal/examples/stata_sparse.txt','r','stata_text')
>>> w = stata_txt.read()
WARNING: there are 7 disconnected observations
Island ids: [5, 9, 10, 11, 12, 14, 15]
>>> stata_txt.close()
>>> w.n
56
MatrixMarket MTX Weights Files¶
This file format or its variant is currently under consideration of the PySAL team to store general spatial weights in a sparse matrix form.
>>> import pysal
>>> mtx = pysal.open('../pysal/examples/wmat.mtx','r')
>>> w = mtx.read()
>>> mtx.close()
>>> w.n
49
Examples: Writing files¶
GAL Binary Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> gal = pysal.open('../pysal/examples/virginia_queen.gal','w')
>>> gal.write(w)
>>> gal.close()
GWT Weights Files¶
Currently, it is not allowed to write a GWT file.
ArcGIS Text Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> arcgis_txt = pysal.open('../pysal/examples/virginia_queen.txt','w','arcgis_text')
>>> arcgis_txt.write(w, useIdIndex=True)
>>> arcgis_txt.close()
ArcGIS DBF Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> arcgis_dbf = pysal.open('../pysal/examples/virginia_queen.dbf','w','arcgis_dbf')
>>> arcgis_dbf.write(w, useIdIndex=True)
>>> arcgis_dbf.close()
ArcGIS SWM Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> arcgis_swm = pysal.open('../pysal/examples/virginia_queen.swm','w')
>>> arcgis_swm.write(w, useIdIndex=True)
>>> arcgis_swm.close()
DAT Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> dat = pysal.open('../pysal/examples/virginia_queen.dat','w')
>>> dat.write(w)
>>> dat.close()
MATLAB MAT Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> mat = pysal.open('../pysal/examples/virginia_queen.mat','w')
>>> mat.write(w)
>>> mat.close()
LOTUS WK1 Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> wk1 = pysal.open('../pysal/examples/virginia_queen.wk1','w')
>>> wk1.write(w)
>>> wk1.close()
GeoBUGS Text Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> geobugs_txt = pysal.open('../pysal/examples/virginia_queen','w','geobugs_text')
>>> geobugs_txt.write(w)
>>> geobugs_txt.close()
STATA Text Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> stata_txt = pysal.open('../pysal/examples/virginia_queen.txt','w','stata_text')
>>> stata_txt.write(w,matrix_form=True)
>>> stata_txt.close()
MatrixMarket MTX Weights Files¶
>>> import pysal
>>> w = pysal.queen_from_shapefile('../pysal/examples/virginia.shp',idVariable='FIPS')
>>> w.n
136
>>> mtx = pysal.open('../pysal/examples/virginia_queen.mtx','w')
>>> mtx.write(w)
>>> mtx.close()
Examples: Converting the format of spatial weights files¶
PySAL provides a utility tool to convert a weights file from one format to another.
From GAL to ArcGIS SWM format
>>> from pysal.core.util.weight_converter import weight_convert
>>> gal_file = '../pysal/examples/sids2.gal'
>>> swm_file = '../pysal/examples/sids2.swm'
>>> weight_convert(gal_file, swm_file, useIdIndex=True)
>>> wold = pysal.open(gal_file, 'r').read()
>>> wnew = pysal.open(swm_file, 'r').read()
>>> wold.n == wnew.n
True
For further details see the FileIO API.
Alternative Tabular API¶
For shapefile input and output, the dataframe API constructs a dataframe similar to that constructed by Geopandas, but populated by PySAL’s own shape classes. This is provided as a convenience method for users who have shapefileheavy workflows and would like to get Geopandasstyle interaction. This API is only a frontend to the existing PySAL api documented above, and users who have heavier spatial data needs may find Geopandas useful.
Spatial Weights¶
Contents
Introduction¶
Spatial weights are central components of many areas of spatial analysis. In general terms, for a spatial data set composed of n locations (points, areal units, network edges, etc.), the spatial weights matrix expresses the potential for interaction between observations at each pair i,j of locations. There is a rich variety of ways to specify the structure of these weights, and PySAL supports the creation, manipulation and analysis of spatial weights matrices across three different general types:
 Contiguity Based Weights
 Distance Based Weights
 Kernel Weights
These different types of weights are implemented as instances or subclasses of
the PySAL weights class
W
.
In what follows, we provide a high level overview of spatial weights in PySAL, starting with the three different types of weights, followed by a closer look at the properties of the W class and some related functions. [1]
PySAL Spatial Weight Types¶
PySAL weights are handled in objects of the pysal.weights.W
. The
conceptual idea of spatial weights is that of a nxn matrix in which the
diagonal elements () are set to zero by definition and the rest of
the cells () capture the potential of interaction. However, these
matrices tend to be fairly sparse (i.e. many cells contain zeros) and hence a
full nxn array would not be an efficient representation. PySAL employs a
different way of storing that is structured in two main dictionaries [2] :
neighbors which, for each observation (key) contains a list of the other ones
(value) with potential for interaction (); and weights,
which contains the weight values for each of those observations (in the same
order). This way, large datasets can be stored when keeping the full matrix
would not be possible because of memory constraints. In addition to the sparse
representation via the weights and neighbors dictionaries, a PySAL W object
also has an attribute called sparse, which is a scipy.sparse CSR
representation of the spatial weights. (See WSP for an alternative
PySAL weights object.)
Contiguity Based Weights¶
To illustrate the general weights object, we start with a simple contiguity matrix constructed for a 5 by 5 lattice (composed of 25 spatial units):
>>> import pysal
>>> w = pysal.lat2W(5, 5)
The w object has a number of attributes:
>>> w.n
25
>>> w.pct_nonzero
12.8
>>> w.weights[0]
[1.0, 1.0]
>>> w.neighbors[0]
[5, 1]
>>> w.neighbors[5]
[0, 10, 6]
>>> w.histogram
[(2, 4), (3, 12), (4, 9)]
n is the number of spatial units, so conceptually we could be thinking that the weights are stored in a 25x25 matrix. The second attribute (pct_nonzero) shows the sparseness of the matrix. The key attributes used to store contiguity relations in W are the neighbors and weights attributes. In the example above we see that the observation with id 0 (Python is zerooffset) has two neighbors with ids [5, 1] each of which have equal weights of 1.0.
The histogram attribute is a set of tuples indicating the cardinality of the neighbor relations. In this case we have a regular lattice, so there are 4 units that have 2 neighbors (corner cells), 12 units with 3 neighbors (edge cells), and 9 units with 4 neighbors (internal cells).
In the above example, the default criterion for contiguity on the lattice was that of the rook which takes as neighbors any pair of cells that share an edge. Alternatively, we could have used the queen criterion to include the vertices of the lattice to define contiguities:
>>> wq = pysal.lat2W(rook = False)
>>> wq.neighbors[0]
[5, 1, 6]
The bishop criterion, which designates pairs of cells as neighbors if they share only a vertex, is yet a third alternative for contiguity weights. A bishop matrix can be computed as the Difference between the rook and queen cases.
The lat2W function is particularly useful in setting up simulation experiments requiring a regular grid. For empirical research, a common use case is to have a shapefile, which is a nontopological vector data structure, and a need to carry out some form of spatial analysis that requires spatial weights. Since topology is not stored in the underlying file there is a need to construct the spatial weights prior to carrying out the analysis.
In PySAL, weights are constructed by default from any contiguity graph representation. Most users will find the .from_shapefile
methods most useful:
>>> w = pysal.weights.Rook.from_shapefile("../pysal/examples/columbus.shp")
>>> w.n
49
>>> print "%.4f"%w.pct_nonzero
0.0833
>>> w.histogram
[(2, 7), (3, 10), (4, 17), (5, 8), (6, 3), (7, 3), (8, 0), (9, 1)]
If queen, rather than rook, contiguity is required then the following would work:
>>> w = pysal.weights.Queen.from_shapefile("../pysal/examples/columbus.shp")
>>> print "%.4f"%w.pct_nonzero
0.0983
>>> w.histogram
[(2, 5), (3, 9), (4, 12), (5, 5), (6, 9), (7, 3), (8, 4), (9, 1), (10, 1)]
In addition to these methods, contiguity weights can be built from dataframes with a geometry column. This includes dataframes built from geopandas or from the PySAL pandas IO extension, pdio. For instance:
>>> import geopandas as gpd
>>> test = gpd.read_file(pysal.examples.get_path('south.shp'))
>>> W = pysal.weights.Queen.from_dataframe(test)
>>> Wrook = pysal.weights.Rook.from_dataframe(test, idVariable='CNTY_FIPS')
>>> pdiodf = pysal.pdio.read_files(pysal.examples.get_path('south.shp'))
>>> W = pysal.weights.Rook.from_dataframe(pdiodf)
Or, weights can be constructed directly from an interable of shapely objects:
>>> import geopandas as gpd
>>> shapelist = gpd.read_file(pysal.examples.get_path('columbus.shp')).geometry.tolist()
>>> W = pysal.weights.Queen.from_iterable(shapelist)
The .from_file
method on contigutiy weights simply passes down to the parent class’s .from_file
method, so the returned object is of instance W
, not Queen
or Rook
. This occurs because the weights cannot be verified as contiguity weights without the original shapes.
>>> W = pysal.weights.Rook.from_file(pysal.examples.get_path('columbus.gal')
>>> type(W)
pysal.weights.weights.W
Distance Based Weights¶
In addition to using contiguity to define neighbor relations, more general functions of the distance separating observations can be used to specify the weights.
Please note that distance calculations are coded for a flat surface, so you will need to have your shapefile projected in advance for the output to be correct.
knearest neighbor weights¶
The neighbors for a given observations can be defined using a knearest neighbor criterion.
For example we could use the the centroids of our
5x5 lattice as point locations to measure the distances. First, we import numpy to
create the coordinates as a 25x2 numpy array named data
:
>>> import numpy as np
>>> x,y=np.indices((5,5))
>>> x.shape=(25,1)
>>> y.shape=(25,1)
>>> data=np.hstack([x,y])
then define the KNN weight as:
>>> wknn3 = pysal.weights.KNN(data, k = 3)
>>> wknn3.neighbors[0]
[1, 5, 6]
>>> wknn3.s0
75.0
For efficiency, a KDTree is constructed to compute efficient nearest neighbor queries. To construct many KNearest neighbor weights from the same data, a convenience method is provided that prevents reconstructing the KDTree while letting the user change aspects of the weight object. By default, the reweight method operates in place:
>>> w4 = wknn3.reweight(k=4, inplace=False)
>>> w4.neighbors[0]
[1,5,6,2]
>>> l1norm = wknn3.reweight(p=1, inplace=False)
>>> l1norm.neighbors
[1,5,2]
>>> set(w4.neighbors[0]) == set([1, 5, 6, 2])
True
>>> w4.s0
100.0
>>> w4.weights[0]
[1.0, 1.0, 1.0, 1.0]
Alternatively, we can use a utility function to build a knn W straight from a shapefile:
>>> wknn5 = pysal.weights.KNN.from_shapefile(pysal.examples.get_path('columbus.shp'), k=5)
>>> wknn5.neighbors[0]
[2, 1, 3, 7, 4]
Or from a dataframe:
>>> import geopandas as gpd
>>> df = gpd.read_file(ps.examples.get_path('baltim.shp'))
>>> k5 = pysal.weights.KNN.from_dataframe(df, k=5)
Distance band weights¶
Knn weights ensure that all observations have the same number of neighbors. [3] An alternative distance based set of weights relies on distance bands or thresholds to define the neighbor set for each spatial unit as those other units falling within a threshold distance of the focal unit:
>>> wthresh = pysal.weights.DistanceBand.from_array(data, 2)
>>> set(wthresh.neighbors[0]) == set([1, 2, 5, 6, 10])
True
>>> set(wthresh.neighbors[1]) == set( [0, 2, 5, 6, 7, 11, 3])
True
>>> wthresh.weights[0]
[1, 1, 1, 1, 1]
>>> wthresh.weights[1]
[1, 1, 1, 1, 1, 1, 1]
>>>
As can be seen in the above example, the number of neighbors is likely to vary across observations with distance band weights in contrast to what holds for knn weights.
In addition to constructing these from the helper function, Distance Band weights. For example, a threshold binary W can be constructed from a dataframe:
>>> import geopandas as gpd
>>> df = gpd.read_file(ps.examples.get_path('baltim.shp'))
>>> ps.weights.DistanceBand.from_dataframe(df, threshold=6, binary=True)
Distance band weights can be generated for shapefiles as well as arrays of points. [4] First, the minimum nearest neighbor distance should be determined so that each unit is assured of at least one neighbor:
>>> thresh = pysal.min_threshold_dist_from_shapefile("../pysal/examples/columbus.shp")
>>> thresh
0.61886415807685413
with this threshold in hand, the distance band weights are obtained as:
>>> wt = pysal.weights.DistanceBand.from_shapefile("../pysal/examples/columbus.shp", threshold=thresh, binary=True)
>>> wt.min_neighbors
1
>>> wt.histogram
[(1, 4), (2, 8), (3, 6), (4, 2), (5, 5), (6, 8), (7, 6), (8, 2), (9, 6), (10, 1), (11, 1)]
>>> set(wt.neighbors[0]) == set([1,2])
True
>>> set(wt.neighbors[1]) == set([3,0])
True
Distance band weights can also be specified to take on continuous values rather than binary, with the values set to the inverse distance separating each pair within a given threshold distance. We illustrate this with a small set of 6 points:
>>> points = [(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]
>>> wid = pysal.weights.DistanceBand.from_array(points,14.2,binary=False)
>>> wid.weights[0]
[0.10000000000000001, 0.089442719099991588]
If we change the distance decay exponent to 2.0 the result is so called gravity weights:
>>> wid2 = pysal.weights.DistanceBand.from_array(points,14.2,alpha = 2.0, binary=False)
>>> wid2.weights[0]
[0.01, 0.0079999999999999984]
Kernel Weights¶
A combination of distance based thresholds together with continuously valued weights is supported through kernel weights:
>>> points = [(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]
>>> kw = pysal.Kernel(points)
>>> kw.weights[0]
[1.0, 0.500000049999995, 0.4409830615267465]
>>> kw.neighbors[0]
[0, 1, 3]
>>> kw.bandwidth
array([[ 20.000002],
[ 20.000002],
[ 20.000002],
[ 20.000002],
[ 20.000002],
[ 20.000002]])
The bandwidth attribute plays the role of the distance threshold with kernel weights, while the form of the kernel function determines the distance decay in the derived continuous weights (the following are available: ‘triangular’,’uniform’,’quadratic’,’epanechnikov’,’quartic’,’bisquare’,’gaussian’). In the above example, the bandwidth is set to the default value and fixed across the observations. The user could specify a different value for a fixed bandwidth:
>>> kw15 = pysal.Kernel(points,bandwidth = 15.0)
>>> kw15[0]
{0: 1.0, 1: 0.33333333333333337, 3: 0.2546440075000701}
>>> kw15.neighbors[0]
[0, 1, 3]
>>> kw15.bandwidth
array([[ 15.],
[ 15.],
[ 15.],
[ 15.],
[ 15.],
[ 15.]])
which results in fewer neighbors for the first unit. Adaptive bandwidths (i.e., different bandwidths for each unit) can also be user specified:
>>> bw = [25.0,15.0,25.0,16.0,14.5,25.0]
>>> kwa = pysal.Kernel(points,bandwidth = bw)
>>> kwa.weights[0]
[1.0, 0.6, 0.552786404500042, 0.10557280900008403]
>>> kwa.neighbors[0]
[0, 1, 3, 4]
>>> kwa.bandwidth
array([[ 25. ],
[ 15. ],
[ 25. ],
[ 16. ],
[ 14.5],
[ 25. ]])
Alternatively the adaptive bandwidths could be defined endogenously:
>>> kwea = pysal.Kernel(points,fixed = False)
>>> kwea.weights[0]
[1.0, 0.10557289844279438, 9.99999900663795e08]
>>> kwea.neighbors[0]
[0, 1, 3]
>>> kwea.bandwidth
array([[ 11.18034101],
[ 11.18034101],
[ 20.000002 ],
[ 11.18034101],
[ 14.14213704],
[ 18.02775818]])
Finally, the kernel function could be changed (with endogenous adaptive bandwidths):
>>> kweag = pysal.Kernel(points,fixed = False,function = 'gaussian')
>>> kweag.weights[0]
[0.3989422804014327, 0.2674190291577696, 0.2419707487162134]
>>> kweag.bandwidth
array([[ 11.18034101],
[ 11.18034101],
[ 20.000002 ],
[ 11.18034101],
[ 14.14213704],
[ 18.02775818]])
More details on kernel weights can be found in
Kernel
. All kernel methods also support construction from shapefiles with Kernel.from_shapefile
and from dataframes with Kernel.from_dataframe
.
Weights from other python objects¶
PySAL weights can also be constructed easily from many other objects. Most importantly, all weight types can be constructed directly from geopandas
geodataframes using the .from_dataframe
method. For distance and kernel weights, underlying features should typically be points. But, if polygons are supplied, the centroids of the polygons will be used by default:
>>> import geopandas as gpd
>>> df = gpd.read_file(pysal.examples.get_path('columbus.shp'))
>>> kw = pysal.weights.Kernel.from_dataframe(df)
>>> dbb = pysal.weights.DistanceBand.from_dataframe(df, threshold=.9, binary=False)
>>> dbc = pysal.weights.DistanceBand.from_dataframe(df, threshold=.9, binary=True)
>>> q = pysal.weights.Queen.from_dataframe(df)
>>> r = pysal.weights.Rook.from_dataframe(df)
This also applies to dynamic views of the dataframe:
>>> q2 = pysal.weights.Queen.from_dataframe(df.query('DISCBD < 2'))
Weights can also be constructed from NetworkX objects. This is easiest to construct using a sparse weight, but that can be converted to a full dense PySAL weight easily:
>>> import networkx as nx
>>> G = nx.random_lobster(50,.2,.5)
>>> sparse_lobster = ps.weights.WSP(nx.adj_matrix(G))
>>> dense_lobster = sparse_lobster.to_W()
A Closer look at W¶
Although the three different types of spatial weights illustrated above cover a wide array of approaches towards specifying spatial relations, they all share common attributes from the base W class in PySAL. Here we take a closer look at some of the more useful properties of this class.
Attributes of W¶
W objects come with a whole bunch of useful attributes that may help you when dealing with spatial weights matrices. To see a list of all of them, same as with any other Python object, type:
>>> import pysal
>>> help(pysal.W)
If you want to be more specific and learn, for example, about the attribute s0, then type:
>>> help(pysal.W.s0) Help on property:float
Weight Transformations¶
Often there is a need to apply a transformation to the spatial weights, such as in the case of row standardization. Here each value in the row of the spatial weights matrix is rescaled to sum to one:
This and other weights transformations in PySAL are supported by the transform property of the W class. To see this let’s build a simple contiguity object for the Columbus data set:
>>> w = pysal.rook_from_shapefile("../pysal/examples/columbus.shp")
>>> w.weights[0]
[1.0, 1.0]
We can row standardize this by setting the transform property:
>>> w.transform = 'r'
>>> w.weights[0]
[0.5, 0.5]
Supported transformations are the following:
 ‘b‘: binary.
 ‘r‘: row standardization.
 ‘v‘: variance stabilizing.
If the original weights (unstandardized) are required, the transform property can be reset:
>>> w.transform = 'o'
>>> w.weights[0]
[1.0, 1.0]
Behind the scenes the transform property is updating all other characteristics of the spatial weights that are a function of the values and these standardization operations, freeing the user from having to keep these other attributes updated. To determine the current value of the transformation, simply query this attribute:
>>> w.transform
'O'
More details on other transformations that are supported in W can be found in
pysal.weights.W
.
WSets¶
PySAL offers setlike manipulation of spatial weights matrices. While a W is more complex than a set, the two objects have a number of commonalities allowing for traditional set operations to have similar functionality on a W. Conceptually, we treat each neighbor pair as an element of a set, and then return the appropriate pairs based on the operation invoked (e.g. union, intersection, etc.). A key distinction between a set and a W is that a W must keep track of the universe of possible pairs, even those that do not result in a neighbor relationship.
PySAL follows the naming conventions for Python sets, but adds optional flags allowing the user to control the shape of the weights object returned. At this time, all the functions discussed in this section return a binary W no matter the weights objects passed in.
Union¶
The union of two weights objects returns a binary weights object, W, that includes all neighbor pairs that exist in either weights object. This function can be used to simply join together two weights objects, say one for Arizona counties and another for California counties. It can also be used to join two weights objects that overlap as in the example below.
>>> w1 = pysal.lat2W(4,4)
>>> w2 = pysal.lat2W(6,4)
>>> w = pysal.w_union(w1, w2)
>>> w1[0] == w[0]
True
>>> w1.neighbors[15]
[11, 14]
>>> w2.neighbors[15]
[11, 14, 19]
>>> w.neighbors[15]
[19, 11, 14]
Intersection¶
The intersection of two weights objects returns a binary weights object, W, that includes only those neighbor pairs that exist in both weights objects. Unlike the union case, where all pairs in either matrix are returned, the intersection only returns a subset of the pairs. This leaves open the question of the shape of the weights matrix to return. For example, you have one weights matrix of census tracts for City A and second matrix of tracts for Utility Company B’s service area, and want to find the W for the tracts that overlap. Depending on the research question, you may want the returned W to have the same dimensions as City A’s weights matrix, the same as the utility company’s weights matrix, a new dimensionality based on all the census tracts in either matrix or with the dimensionality of just those tracts in the overlapping area. All of these options are available via the w_shape parameter and the order that the matrices are passed to the function. The following example uses the all case:
>>> w1 = pysal.lat2W(4,4)
>>> w2 = pysal.lat2W(6,4)
>>> w = pysal.w_intersection(w1, w2, 'all')
WARNING: there are 8 disconnected observations
Island ids: [16, 17, 18, 19, 20, 21, 22, 23]
>>> w1[0] == w[0]
True
>>> w1.neighbors[15]
[11, 14]
>>> w2.neighbors[15]
[11, 14, 19]
>>> w.neighbors[15]
[11, 14]
>>> w2.neighbors[16]
[12, 20, 17]
>>> w.neighbors[16]
[]
Difference¶
The difference of two weights objects returns a binary weights object, W, that includes only neighbor pairs from the first object that are not in the second. Similar to the intersection function, the user must select the shape of the weights object returned using the w_shape parameter. The user must also consider the constrained parameter which controls whether the observations and the neighbor pairs are differenced or just the neighbor pairs are differenced. If you were to apply the difference function to our city and utility company example from the intersection section above, you must decide whether or not pairs that exist along the border of the regions should be considered different or not. It boils down to whether the tracts should be differenced first and then the differenced pairs identified (constrained=True), or if the differenced pairs should be identified based on the sets of pairs in the original weights matrices (constrained=False). In the example below we difference weights matrices from regions with partial overlap.
>>> w1 = pysal.lat2W(6,4)
>>> w2 = pysal.lat2W(4,4)
>>> w1.neighbors[15]
[11, 14, 19]
>>> w2.neighbors[15]
[11, 14]
>>> w = pysal.w_difference(w1, w2, w_shape = 'w1', constrained = False)
WARNING: there are 12 disconnected observations
Island ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> w.neighbors[15]
[19]
>>> w.neighbors[19]
[15, 18, 23]
>>> w = pysal.w_difference(w1, w2, w_shape = 'min', constrained = False)
>>> 15 in w.neighbors
False
>>> w.neighbors[19]
[18, 23]
>>> w = pysal.w_difference(w1, w2, w_shape = 'w1', constrained = True)
WARNING: there are 16 disconnected observations
Island ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
>>> w.neighbors[15]
[]
>>> w.neighbors[19]
[18, 23]
>>> w = pysal.w_difference(w1, w2, w_shape = 'min', constrained = True)
>>> 15 in w.neighbors
False
>>> w.neighbors[19]
[18, 23]
The difference function can be used to construct a bishop contiguity weights matrix by differencing a queen and rook matrix.
>>> wr = pysal.lat2W(5,5)
>>> wq = pysal.lat2W(5,5,rook = False)
>>> wb = pysal.w_difference(wq, wr,constrained = False)
>>> wb.neighbors[0]
[6]
Symmetric Difference¶
Symmetric difference of two weights objects returns a binary weights object, W, that includes only neighbor pairs that are not shared by either matrix. This function offers options similar to those in the difference function described above.
>>> w1 = pysal.lat2W(6, 4)
>>> w2 = pysal.lat2W(2, 4)
>>> w_lower = pysal.w_difference(w1, w2, w_shape = 'min', constrained = True)
>>> w_upper = pysal.lat2W(4, 4)
>>> w = pysal.w_symmetric_difference(w_lower, w_upper, 'all', False)
>>> w_lower.id_order
[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
>>> w_upper.id_order
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
>>> w.id_order
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
>>> w.neighbors[11]
[7]
>>> w = pysal.w_symmetric_difference(w_lower, w_upper, 'min', False)
WARNING: there are 8 disconnected observations
Island ids: [0, 1, 2, 3, 4, 5, 6, 7]
>>> 11 in w.neighbors
False
>>> w.id_order
[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
>>> w = pysal.w_symmetric_difference(w_lower, w_upper, 'all', True)
WARNING: there are 16 disconnected observations
Island ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
>>> w.neighbors[11]
[]
>>> w = pysal.w_symmetric_difference(w_lower, w_upper, 'min', True)
WARNING: there are 8 disconnected observations
Island ids: [0, 1, 2, 3, 4, 5, 6, 7]
>>> 11 in w.neighbors
False
Subset¶
Subset of a weights object returns a binary weights object, W, that includes only those observations provided by the user. It also can be used to add islands to a previously existing weights object.
>>> w1 = pysal.lat2W(6, 4)
>>> w1.id_order
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
>>> ids = range(16)
>>> w = pysal.w_subset(w1, ids)
>>> w.id_order
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
>>> w1[0] == w[0]
True
>>> w1.neighbors[15]
[11, 14, 19]
>>> w.neighbors[15]
[11, 14]
WSP¶
A thin PySAL weights object is available to users with extremely large weights
matrices, on the order of 2 million or more observations, or users interested
in holding many large weights matrices in RAM simultaneously. The
pysal.weights.WSP
is a thin weights object that does not include the
neighbors and weights dictionaries, but does contain the scipy.sparse form of
the weights. For many PySAL functions the W and WSP objects can be used
interchangeably.
A WSP object can be constructed from a Matrix Market file (see MatrixMarket MTX Weights Files for more info on reading and writing mtx files in PySAL):
>>> mtx = pysal.open("../pysal/examples/wmat.mtx", 'r')
>>> wsp = mtx.read(sparse=True)
or built directly from a scipy.sparse object:
>>> import scipy.sparse
>>> rows = [0, 1, 1, 2, 2, 3]
>>> cols = [1, 0, 2, 1, 3, 3]
>>> weights = [1, 0.75, 0.25, 0.9, 0.1, 1]
>>> sparse = scipy.sparse.csr_matrix((weights, (rows, cols)), shape=(4,4))
>>> w = pysal.weights.WSP(sparse)
The WSP object has subset of the attributes of a W object; for example:
>>> w.n
4
>>> w.s0
4.0
>>> w.trcWtW_WW
6.3949999999999996
The following functionality is available to convert from a W to a WSP:
>>> w = pysal.weights.lat2W(5,5)
>>> w.s0
80.0
>>> wsp = pysal.weights.WSP(w.sparse)
>>> wsp.s0
80.0
and from a WSP to W:
>>> sp = pysal.weights.lat2SW(5, 5)
>>> wsp = pysal.weights.WSP(sp)
>>> wsp.s0
80
>>> w = pysal.weights.WSP2W(wsp)
>>> w.s0
80
Further Information¶
For further details see the Weights API.
Footnotes
[1]  Although this tutorial provides an introduction to the functionality of the PySAL weights class, it is not exhaustive. Complete documentation for the class and associated functions can be found by accessing the help from within a Python interpreter. 
[2]  The dictionaries for the weights and value attributes in W are readonly. 
[3]  Ties at the knn distance band are randomly broken to ensure each observation has exactly k neighbors. 
[4]  If the shapefile contains geographical coordinates these distance calculations will be misleading and the user should first project their coordinates using a GIS. 
[5]  The ordering exploits the onetoone relation between a record in the DBF file and the shape in the shapefile. 
Spatial Autocorrelation¶
Contents
Introduction¶
Spatial autocorrelation pertains to the nonrandom pattern of attribute values over a set of spatial units. This can take two general forms: positive autocorrelation which reflects value similarity in space, and negative autocorrelation or value dissimilarity in space. In either case the autocorrelation arises when the observed spatial pattern is different from what would be expected under a random process operating in space.
Spatial autocorrelation can be analyzed from two different perspectives. Global autocorrelation analysis involves the study of the entire map pattern and generally asks the question as to whether the pattern displays clustering or not. Local autocorrelation, on the other hand, shifts the focus to explore within the global pattern to identify clusters or so called hot spots that may be either driving the overall clustering pattern, or that reflect heterogeneities that depart from global pattern.
In what follows, we first highlight the global spatial autocorrelation classes in PySAL. This is followed by an illustration of the analysis of local spatial autocorrelation.
Global Autocorrelation¶
PySAL implements five different tests for global spatial autocorrelation: the Gamma index of spatial autocorrelation, join count statistics, Moran’s I, Geary’s C, and Getis and Ord’s G.
Gamma Index of Spatial Autocorrelation¶
The Gamma Index of spatial autocorrelation consists of the application of the principle behind a general crossproduct statistic to measuring spatial autocorrelation. [1] The idea is to assess whether two similarity matrices for n objects, i.e., n by n matrices A and B measure the same type of similarity. This is reflected in a socalled Gamma Index . In other words, the statistic consists of the sum over all crossproducts of matching elements (i,j) in the two matrices.
The application of this principle to spatial autocorrelation consists of turning the first similarity matrix into a measure of attribute similarity and the second matrix into a measure of locational similarity. Naturally, the second matrix is the a spatial weight matrix. The first matrix can be any reasonable measure of attribute similarity or dissimilarity, such as a crossproduct, squared difference or absolute difference.
Formally, then, the Gamma index is:
where the are the elements of the weights matrix and are corresponding measures of attribute similarity.
Inference for this statistic is based on a permutation approach in which the values are shuffled around among the locations and the statistic is recomputed each time. This creates a reference distribution for the statistic under the null hypothesis of spatial randomness. The observed statistic is then compared to this reference distribution and a pseudosignificance computed as
where m is the number of values from the reference distribution that are equal to or greater than the observed join count and n is the number of permutations.
The Gamma test is a twosided test in the sense that both extremely high values (e.g., larger than any value in the reference distribution) and extremely low values (e.g., smaller than any value in the reference distribution) can be considered to be significant. Depending on how the measure of attribute similarity is defined, a high value will indicate positive or negative spatial autocorrelation, and vice versa. For example, for a crossproduct measure of attribute similarity, high values indicate positive spatial autocorrelation and low values negative spatial autocorrelation. For a squared difference measure, it is the reverse. This is similar to the interpretation of the Moran’s I statistic and Geary’s C statistic respectively.
Many spatial autocorrelation test statistics can be shown to be special cases of the Gamma index. In most instances, the Gamma index is an unstandardized version of the commonly used statistics. As such, the Gamma index is scale dependent, since no normalization is carried out (such as deviations from the mean or rescaling by the variance). Also, since the sum is over all the elements, the value of a Gamma statistic will grow with the sample size, everything else being the same.
PySAL implements four forms of the Gamma index. Three of these are prespecified and one allows the user to pass any function that computes a measure of attribute similarity. This function should take three parameters: the vector of observations, an index i and an index j.
We will illustrate the Gamma index using the same small artificial example as we use for the Join Count Statistics in order to illustrate the similarities and differences between them. The data consist of a regular 4 by 4 lattice with values of 0 in the top half and values of 1 in the bottom half. We start with the usual imports, and set the random seed to 12345 in order to be able to replicate the results of the permutation approach.
>>> import pysal
>>> import numpy as np
>>> np.random.seed(12345)
We create the binary weights matrix for the 4 x 4 lattice and generate the observation vector y:
>>> w=pysal.lat2W(4,4)
>>> y=np.ones(16)
>>> y[0:8]=0
The Gamma index function has five arguments, three of which are optional.
The first two arguments are the vector of observations (y) and the spatial
weights object (w). Next are operation
, the measure of attribute similarity,
the default of which is operation = 'c'
for crossproduct similarity,
. The other two builtin options are operation = 's'
for
squared difference, and operation = 'a'
for
absolute difference, . The fourth option is to
pass an arbitrary attribute similarity function, as in operation = func
, where func
is a function with three arguments, def func(y,i,j)
with y as the vector
of observations, and i and j as indices. This function should return a single
value for attribute similarity.
The fourth argument allows the observed values to be standardized before the
calculation of the Gamma index. To some extent, this addresses the scale dependence
of the index, but not its dependence on the number of observations. The default
is no standardization, standardize = 'no'
. To force standardization,
set standardize = 'yes'
or 'y'
. The final argument is the number of
permutations, permutations
with the default set to 999.
As a first illustration, we invoke the Gamma index using all the default
values, i.e. crossproduct similarity, no standardization, and permutations
set to 999. The interesting statistics are the magnitude of the Gamma index g
,
the standardized Gamma index using the mean and standard deviation from the
reference distribution, g_z
and the pseudop value obtained from the
permutation, g_sim_p
. In addition, the minimum (min_g
), maximum (max_g
)
and mean (mean_g
) of the reference distribution are available as well.
>>> g = pysal.Gamma(y,w)
>>> g.g
20.0
>>> "%.3f"%g.g_z
'3.188'
>>> g.p_sim_g
0.0030000000000000001
>>> g.min_g
0.0
>>> g.max_g
20.0
>>> g.mean_g
11.093093093093094
Note that the value for Gamma is exactly twice the BB statistic obtained in the example below, since the attribute similarity criterion is identical, but Gamma is not divided by 2.0. The observed value is very extreme, with only two replications from the permutation equalling the value of 20.0. This indicates significant positive spatial autocorrelation.
As a second illustration, we use the squared difference criterion, which corresponds to the BW Join Count statistic. We reset the random seed to keep comparability of the results.
>>> np.random.seed(12345)
>>> g1 = pysal.Gamma(y,w,operation='s')
>>> g1.g
8.0
>>> "%.3f"%g1.g_z
'3.706'
>>> g1.p_sim_g
0.001
>>> g1.min_g
14.0
>>> g1.max_g
48.0
>>> g1.mean_g
25.623623623623622
The Gamma index value of 8.0 is exactly twice the value of the BW statistic for this example. However, since the Gamma index is used for a twosided test, this value is highly significant, and with a negative zvalue, this suggests positive spatial autocorrelation (similar to Geary’s C). In other words, this result is consistent with the finding for the Gamma index that used crossproduct similarity.
As a third example, we use the absolute difference for attribute similarity. The results are identical to those for squared difference since these two criteria are equivalent for 01 values.
>>> np.random.seed(12345)
>>> g2 = pysal.Gamma(y,w,operation='a')
>>> g2.g
8.0
>>> "%.3f"%g2.g_z
'3.706'
>>> g2.p_sim_g
0.001
>>> g2.min_g
14.0
>>> g2.max_g
48.0
>>> g2.mean_g
25.623623623623622
We next illustrate the effect of standardization, using the default operation. As shown, the value of the statistic is quite different from the unstandardized form, but the inference is equivalent.
>>> np.random.seed(12345)
>>> g3 = pysal.Gamma(y,w,standardize='y')
>>> g3.g
32.0
>>> "%.3f"%g3.g_z
'3.706'
>>> g3.p_sim_g
0.001
>>> g3.min_g
48.0
>>> g3.max_g
20.0
>>> "%.3f"%g3.mean_g
'3.247'
Note that all the tests shown here have used the weights matrix in binary form. However, since the Gamma index is perfectly general, any standardization can be applied to the weights.
Finally, we illustrate the use of an arbitrary attribute similarity function.
In order to compare to the results above, we will define a function that
produces a cross product similarity measure. We will then pass this function
to the operation
argument of the Gamma index.
>>> np.random.seed(12345)
>>> def func(z,i,j):
... q = z[i]*z[j]
... return q
...
>>> g4 = pysal.Gamma(y,w,operation=func)
>>> g4.g
20.0
>>> "%.3f"%g4.g_z
'3.188'
>>> g4.p_sim_g
0.0030000000000000001
As expected, the results are identical to those obtained with the default operation.
Join Count Statistics¶
The join count statistics measure global spatial autocorrelation for binary data, i.e., with observations coded as 1 or B (for Black) and 0 or W (for White). They follow the very simple principle of counting joins, i.e., the arrangement of values between pairs of observations where the pairs correspond to neighbors. The three resulting join count statistics are BB, WW and BW. Both BB and WW are measures of positive spatial autocorrelation, whereas BW is an indicator of negative spatial autocorrelation.
To implement the join count statistics, we need the spatial weights matrix in binary (not rowstandardized) form. With as the vector of observations and the spatial weight as , the three statistics can be expressed as:
By convention, the join counts are divided by 2 to avoid double counting. Also, since the three joins exhaust all the possibilities, they sum to one half (because of the division by 2) of the total sum of weights .
Inference for the join count statistics can be based on either an analytical approach or a computational approach. The analytical approach starts from the binomial distribution and derives the moments of the statistics under the assumption of free sampling and nonfree sampling. The resulting mean and variance are used to construct a standardized zvariable which can be approximated as a standard normal variate. [2] However, the approximation is often poor in practice. We therefore only implement the computational approach.
Computational inference is based on a permutation approach in which the values of y are randomly reshuffled many times to obtain a reference distribution of the statistics under the null hypothesis of spatial randomness. The observed join count is then compared to this reference distribution and a pseudosignificance computed as
where m is the number of values from the reference distribution that are equal to or greater than the observed join count and n is the number of permutations. Note that the join counts are a one sidedtest. If the counts are extremely smaller than the reference distribution, this is not an indication of significance. For example, if the BW counts are extremely small, this is not an indication of negative BW autocorrelation, but instead points to the presence of BB or WW autocorrelation.
We will illustrate the join count statistics with a simple artificial example of a 4 by 4 square lattice with values of 0 in the top half and values of 1 in the bottom half.
We start with the usual imports, and set the random seed to 12345 in order to be able to replicate the results of the permutation approach.
>>> import pysal
>>> import numpy as np
>>> np.random.seed(12345)
We create the binary weights matrix for the 4 x 4 lattice and generate the observation vector y:
>>> w=pysal.lat2W(4,4)
>>> y=np.ones(16)
>>> y[0:8]=0
We obtain an instance of the joint count statistics BB, BW and WW as (J is half the sum of all the weights and should equal the sum of BB, WW and BW):
>>> jc=pysal.Join_Counts(y,w)
>>> jc.bb
10.0
>>> jc.bw
4.0
>>> jc.ww
10.0
>>> jc.J
24.0
The number of permutations is set to 999 by default. For other values, this parameter needs to be passed explicitly, as in:
>>> jc=pysal.Join_Counts(y,w,permutations=99)
The results in our simple example show that the BB counts are 10. There are in fact 3 horizontal joins in each of the bottom rows of the lattice as well as 4 vertical joins, which makes for bb = 3 + 3 + 4 = 10. The BW joins are 4, matching the separation between the bottom and top part.
The permutation results give a pseudop value for BB of 0.003, suggesting highly significant positive spatial autocorrelation. The average BB count for the sample of 999 replications is 5.5, quite a bit lower than the count of 10 we obtain. Only two instances of the replicated samples yield a value equal to 10, none is greater (the randomly permuted samples yield bb values between 0 and 10).
>>> len(jc.sim_bb)
999
>>> jc.p_sim_bb
0.0030000000000000001
>>> np.mean(jc.sim_bb)
5.5465465465465469
>>> np.max(jc.sim_bb)
10.0
>>> np.min(jc.sim_bb)
0.0
The results for BW (negative spatial autocorrelation) show a probability of 1.0 under the null hypothesis. This means that all the values of BW from the randomly permuted data sets were larger than the observed value of 4. In fact the range of these values is between 7 and 24. In other words, this again strongly points towards the presence of positive spatial autocorrelation. The observed number of BB and WW joins (10 each) is so high that there are hardly any BW joins (4).
>>> len(jc.sim_bw)
999
>>> jc.p_sim_bw
1.0
>>> np.mean(jc.sim_bw)
12.811811811811811
>>> np.max(jc.sim_bw)
24.0
>>> np.min(jc.sim_bw)
7.0
Moran’s I¶
Moran’s I measures the global spatial autocorrelation in an attribute measured over spatial units and is given as:
where is a spatial weight, , and . We illustrate the use of Moran’s I with a case study of homicide rates for a group of 78 counties surrounding St. Louis over the period 198893. [3] We start with the usual imports:
>>> import pysal
>>> import numpy as np
Next, we read in the homicide rates:
>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])
To calculate Moran’s I we first need to read in a GAL file for a rook weights matrix and create an instance of W:
>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()
The instance of Moran’s I can then be obtained with:
>>> mi = pysal.Moran(y, w, two_tailed=False)
>>> "%.3f"%mi.I
'0.244'
>>> mi.EI
0.012987012987012988
>>> "%.5f"%mi.p_norm
'0.00014'
From these results, we see that the observed value for I is significantly above its expected value, under the assumption of normality for the homicide rates.
If we peek inside the mi object to learn more:
>>> help(mi)
which generates:
Help on instance of Moran in module pysal.esda.moran:
class Moran
 Moran's I Global Autocorrelation Statistic

 Parameters
 

 y : array
 variable measured across n spatial units
 w : W
 spatial weights instance
 permutations : int
 number of random permutations for calculation of pseudop_values


 Attributes
 
 y : array
 original variable
 w : W
 original w object
 permutations : int
 number of permutations
 I : float
 value of Moran's I
 EI : float
 expected value under normality assumption
 VI_norm : float
 variance of I under normality assumption
 seI_norm : float
 standard deviation of I under normality assumption
 z_norm : float
 zvalue of I under normality assumption
 p_norm : float
 pvalue of I under normality assumption (onesided)
 for twosided tests, this value should be multiplied by 2
 VI_rand : float
 variance of I under randomization assumption
 seI_rand : float
 standard deviation of I under randomization assumption
 z_rand : float
 zvalue of I under randomization assumption
 p_rand : float
 pvalue of I under randomization assumption (1tailed)
 sim : array (if permutations>0)
we see that we can base the inference not only on the normality assumption, but also on random permutations of the values on the spatial units to generate a reference distribution for I under the null:
>>> np.random.seed(10)
>>> mir = pysal.Moran(y, w, permutations = 9999)
The pseudo p value based on these permutations is:
>>> print mir.p_sim
0.0022
in other words there were 14 permutations that generated values for I that were as extreme as the original value, so the p value becomes (14+1)/(9999+1). [4] Alternatively, we could use the realized values for I from the permutations and compare the original I using a ztransformation to get:
>>> print mir.EI_sim
0.0118217511619
>>> print mir.z_sim
4.55451777821
>>> print mir.p_z_sim
2.62529422013e06
When the variable of interest () is rates based on populations with different sizes, the Moran’s I value for needs to be adjusted to account for the differences among populations. [5] To apply this adjustment, we can create an instance of the Moran_Rate class rather than the Moran class. For example, let’s assume that we want to estimate the Moran’s I for the rates of newborn infants who died of Sudden Infant Death Syndrome (SIDS). We start this estimation by reading in the total number of newborn infants (BIR79) and the total number of newborn infants who died of SIDS (SID79):
>>> f = pysal.open(pysal.examples.get_path("sids2.dbf"))
>>> b = np.array(f.by_col('BIR79'))
>>> e = np.array(f.by_col('SID79'))
Next, we create an instance of W:
>>> w = pysal.open(pysal.examples.get_path("sids2.gal")).read()
Now, we create an instance of Moran_Rate:
>>> mi = pysal.esda.moran.Moran_Rate(e, b, w, two_tailed=False)
>>> "%6.4f" % mi.I
'0.1662'
>>> "%6.4f" % mi.EI
'0.0101'
>>> "%6.4f" % mi.p_norm
'0.0042'
From these results, we see that the observed value for I is significantly higher than its expected value, after the adjustment for the differences in population.
Geary’s C¶
The fourth statistic for global spatial autocorrelation implemented in PySAL is Geary’s C:
with all the terms defined as above. Applying this to the St. Louis data:
>>> np.random.seed(12345)
>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])
>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()
>>> gc = pysal.Geary(y, w)
>>> "%.3f"%gc.C
'0.597'
>>> gc.EC
1.0
>>> "%.3f"%gc.z_norm
'5.449'
we see that the statistic is significantly lower than its expected value . Although the sign of the standardized statistic is negative (in contrast to what held for , the interpretation is the same, namely evidence of strong positive spatial autocorrelation in the homicide rates.
Similar to what we saw for Moran’s I, we can base inference on Geary’s using random spatial permutations, which are actually run as a default with the number of permutations=999 (this is why we set the seed of the random number generator to 12345 to replicate the result):
>>> gc.p_sim
0.001
which indicates that none of the C values from the permuted samples was as extreme as our observed value.
Getis and Ord’s G¶
The last statistic for global spatial autocorrelation implemented in PySAL is Getis and Ord’s G:
where is a threshold distance used to define a spatial weight.
Only pysal.weights.Distance.DistanceBand
weights objects are applicable to Getis and Ord’s G.
Applying this to the St. Louis data:
>>> dist_w = pysal.threshold_binaryW_from_shapefile('../pysal/examples/stl_hom.shp',0.6)
>>> dist_w.transform = "B"
>>> from pysal.esda.getisord import G
>>> g = G(y, dist_w)
>>> print g.G
0.103483215873
>>> print g.EG
0.0752580752581
>>> print g.z_norm
3.28090342959
>>> print g.p_norm
0.000517375830488
Although we switched the contiguitybased weights object into another distancebased one, we see that the statistic is significantly higher than its expected value under the assumption of normality for the homicide rates.
Similar to what we saw for Moran’s I and Geary’s C, we can base inference on Getis and Ord’s G using random spatial permutations:
>>> np.random.seed(12345)
>>> g = G(y, dist_w, permutations=9999)
>>> print g.p_z_sim
0.000564384586974
>>> print g.p_sim
0.0065
with the first pvalue based on a ztransform of the observed G relative to the distribution of values obtained in the permutations, and the second based on the cumulative probability of the observed value in the empirical distribution.
Local Autocorrelation¶
To measure local autocorrelation quantitatively, PySAL implements Local Indicators of Spatial Association (LISAs) for Moran’s I and Getis and Ord’s G.
Local Moran’s I¶
PySAL implements local Moran’s I as follows:
which results in values of local spatial autocorrelation, 1 for each spatial unit. Continuing on with the St. Louis example, the LISA statistics are obtained with:
>>> f = pysal.open(pysal.examples.get_path("stl_hom.txt"))
>>> y = np.array(f.by_col['HR8893'])
>>> w = pysal.open(pysal.examples.get_path("stl.gal")).read()
>>> np.random.seed(12345)
>>> lm = pysal.Moran_Local(y,w)
>>> lm.n
78
>>> len(lm.Is)
78
thus we see 78 LISAs are stored in the vector lm.Is. Inference about these values is obtained through conditional randomization [6] which leads to pseudo pvalues for each LISA:
>>> lm.p_sim
array([ 0.176, 0.073, 0.405, 0.267, 0.332, 0.057, 0.296, 0.242,
0.055, 0.062, 0.273, 0.488, 0.44 , 0.354, 0.415, 0.478,
0.473, 0.374, 0.415, 0.21 , 0.161, 0.025, 0.338, 0.375,
0.285, 0.374, 0.208, 0.3 , 0.373, 0.411, 0.478, 0.414,
0.009, 0.429, 0.269, 0.015, 0.005, 0.002, 0.077, 0.001,
0.088, 0.459, 0.435, 0.365, 0.231, 0.017, 0.033, 0.04 ,
0.068, 0.101, 0.284, 0.309, 0.113, 0.457, 0.045, 0.269,
0.118, 0.346, 0.328, 0.379, 0.342, 0.39 , 0.376, 0.467,
0.357, 0.241, 0.26 , 0.401, 0.185, 0.172, 0.248, 0.4 ,
0.482, 0.159, 0.373, 0.455, 0.083, 0.128])
To identify the significant [7] LISA values we can use numpy indexing:
>>> sig = lm.p_sim<0.05
>>> lm.p_sim[sig]
array([ 0.025, 0.009, 0.015, 0.005, 0.002, 0.001, 0.017, 0.033,
0.04 , 0.045])
and then use this indexing on the q attribute to find out which quadrant of the Moran scatter plot each of the significant values is contained in:
>>> lm.q[sig]
array([4, 1, 3, 1, 3, 1, 1, 3, 3, 3])
As in the case of global Moran’s I, when the variable of interest is rates based on populations with different sizes, we need to account for the differences among population to estimate local Moran’s Is. Continuing on with the SIDS example above, the adjusted local Moran’s Is are obtained with:
>>> f = pysal.open(pysal.examples.get_path("sids2.dbf"))
>>> b = np.array(f.by_col('BIR79'))
>>> e = np.array(f.by_col('SID79'))
>>> w = pysal.open(pysal.examples.get_path("sids2.gal")).read()
>>> np.random.seed(12345)
>>> lm = pysal.esda.moran.Moran_Local_Rate(e, b, w)
>>> lm.Is[:10]
array([0.13452366, 1.21133985, 0.05019761, 0.06127125, 0.12627466,
0.23497679, 0.26345855, 0.00951288, 0.01517879, 0.34513514])
As demonstrated above, significant Moran’s Is can be identified by using numpy indexing:
>>> sig = lm.p_sim<0.05
>>> lm.p_sim[sig]
array([ 0.021, 0.04 , 0.047, 0.015, 0.001, 0.017, 0.032, 0.031,
0.019, 0.014, 0.004, 0.048, 0.003])
Local G and G*¶
Getis and Ord’s G can be localized in two forms: and .
where we have , , , , , and , and denote the usual sample mean and variance of .
Continuing on with the St. Louis example, the and statistics are obtained with:
>>> from pysal.esda.getisord import G_Local
>>> np.random.seed(12345)
>>> lg = G_Local(y, dist_w)
>>> lg.n
78
>>> len(lg.Gs)
78
>>> lgstar = G_Local(y, dist_w, star=True)
>>> lgstar.n
78
>>> len(lgstar.Gs)
78
thus we see 78 and are stored in the vector lg.Gs and lgstar.Gs, respectively. Inference about these values is obtained through conditional randomization as in the case of local Moran’s I:
>>> lg.p_sim
array([ 0.301, 0.037, 0.457, 0.011, 0.062, 0.006, 0.094, 0.163,
0.075, 0.078, 0.419, 0.286, 0.138, 0.443, 0.36 , 0.484,
0.434, 0.251, 0.415, 0.21 , 0.177, 0.001, 0.304, 0.042,
0.285, 0.394, 0.208, 0.089, 0.244, 0.493, 0.478, 0.433,
0.006, 0.429, 0.037, 0.105, 0.005, 0.216, 0.23 , 0.023,
0.105, 0.343, 0.395, 0.305, 0.264, 0.017, 0.033, 0.01 ,
0.001, 0.115, 0.034, 0.225, 0.043, 0.312, 0.045, 0.092,
0.118, 0.428, 0.258, 0.379, 0.408, 0.39 , 0.475, 0.493,
0.357, 0.298, 0.232, 0.454, 0.149, 0.161, 0.226, 0.4 ,
0.482, 0.159, 0.27 , 0.423, 0.083, 0.128])
To identify the significant values we can use numpy indexing:
>>> sig = lg.p_sim<0.05
>>> lg.p_sim[sig]
array([ 0.037, 0.011, 0.006, 0.001, 0.042, 0.006, 0.037, 0.005,
0.023, 0.017, 0.033, 0.01 , 0.001, 0.034, 0.043, 0.045])
Further Information¶
For further details see the ESDA API.
Footnotes
[1]  Hubert, L., R. Golledge and C.M. Costanzo (1981). Generalized procedures for evaluating spatial autocorrelation. Geographical Analysis 13, 224233. 
[2]  Technical details and derivations can be found in A.D. Cliff and J.K. Ord (1981). Spatial Processes, Models and Applications. London, Pion, pp. 3441. 
[3]  Messner, S., L. Anselin, D. Hawkins, G. Deane, S. Tolnay, R. Baller (2000). An Atlas of the Spatial Patterning of CountyLevel Homicide, 19601990. Pittsburgh, PA, National Consortium on Violence Research (NCOVR) 
[4]  Because the permutations are random, results from those presented here may vary if you replicate this example. 
[5]  Assuncao, R. E. and Reis, E. A. 1999. A new proposal to adjust Moran’s I for population density. Statistics in Medicine. 18, 21472162. 
[6]  The n1 spatial units other than i are used to generate the empirical distribution of the LISA statistics for each i. 
[7]  Caution is required in interpreting the significance of the LISA statistics due to difficulties with multiple comparisons and a lack of independence across the individual tests. For further discussion see Anselin, L. (1995). “Local indicators of spatial association – LISA”. Geographical Analysis, 27, 93115. 
Spatial Econometrics¶
Comprehensive user documentation on spreg can be found in Anselin, L. and S.J. Rey (2014) Modern Spatial Econometrics in Practice: A Guide to GeoDa, GeoDaSpace and PySAL. GeoDa Press, Chicago.
Spatial Smoothing¶
Contents
Introduction¶
In the spatial analysis of attributes measured for areal units, it is often necessary to transform an extensive variable, such as number of disease cases per census tract, into an intensive variable that takes into account the underlying population at risk. Raw rates, counts divided by population values, are a common standardization in the literature, yet these tend to have unequal reliability due to different population sizes across the spatial units. This problem becomes severe for areas with small population values, since the raw rates for those areas tend to have higher variance.
A variety of spatial smoothing methods have been suggested to address this problem by aggregating the counts and population values for the areas neighboring an observation and using these new measurements for its rate computation. PySAL provides a range of smoothing techniques that exploit different types of moving windows and nonparametric weighting schemes as well as the Empirical Bayesian principle. In addition, PySAL offers several methods for calculating agestandardized rates, since age standardization is critical in estimating rates of some events where the probability of an event occurrence is different across different age groups.
In what follows, we overview the methods for age standardization and spatial smoothing and describe their implementations in PySAL. [1]
Age Standardization in PySAL¶
Raw rates, counts divided by populations values, are based on an implicit assumption that the risk of an event is constant over all age/sex categories in the population. For many phenomena, however, the risk is not uniform and often highly correlated with age. To take this into account explicitly, the risks for individual age categories can be estimated separately and averaged to produce a representative value for an area.
PySAL supports three approaches to this age standardization: crude, direct, and indirect standardization.
Crude Age Standardization¶
In this approach, the rate for an area is simply the sum of agespecific rates weighted by the ratios of each age group in the total population.
To obtain the rates based on this approach, we first need to create two variables that correspond to event counts and population values, respectively.
>>> import numpy as np
>>> e = np.array([30, 25, 25, 15, 33, 21, 30, 20])
>>> b = np.array([100, 100, 110, 90, 100, 90, 110, 90])
Each set of numbers should include n by h elements where n and h are the number of areal units and the number of age groups. In the above example there are two regions with 4 age groups. Age groups are identical across regions. The first four elements in b represent the populations of 4 age groups in the first region, and the last four elements the populations of the same age groups in the second region.
To apply the crude age standardization, we need to make the following function call:
>>> from pysal.esda import smoothing as sm
>>> sm.crude_age_standardization(e, b, 2)
array([ 0.2375 , 0.26666667])
In the function call above, the last argument indicates the number of area units. The outcome in the second line shows that the agestandardized rates for two areas are about 0.24 and 0.27, respectively.
Direct Age Standardization¶
Direct age standardization is a variation of the crude age standardization. While crude age standardization uses the ratios of each age group in the observed population, direct age standardization weights agespecific rates by the ratios of each age group in a reference population. This reference population, the socalled standard million, is another required argument in the PySAL implementation of direct age standardization:
>>> s = np.array([100, 90, 100, 90, 100, 90, 100, 90])
>>> rate = sm.direct_age_standardization(e, b, s, 2, alpha=0.05)
>>> np.array(rate).round(6)
array([[ 0.23744 , 0.192049, 0.290485],
[ 0.266507, 0.217714, 0.323051]])
The outcome of direct age standardization includes a set of standardized rates and their confidence intervals. The confidence intervals can vary according to the value for the last argument, alpha.
Indirect Age Standardization¶
While direct age standardization effectively addresses the variety in the risks across age groups, its indirect counterpart is better suited to handle the potential imprecision of agespecific rates due to the small population size. This method uses agespecific rates from the standard million instead of the observed population. It then weights the rates by the ratios of each age group in the observed population. To compute the agespecific rates from the standard million, the PySAL implementation of indirect age standardization requires another argument that contains the counts of the events occurred in the standard million.
>>> s_e = np.array([10, 15, 12, 10, 5, 3, 20, 8])
>>> rate = sm.indirect_age_standardization(e, b, s_e, s, 2, alpha=0.05)
>>> np.array(rate).round(6)
array([[ 0.208055, 0.170156, 0.254395],
[ 0.298892, 0.246631, 0.362228]])
The outcome of indirect age standardization is the same as that of its direct counterpart.
Spatial Smoothing in PySAL¶
Mean and Median Based Smoothing¶
A simple approach to rate smoothing is to find a local average or median from the rates of each observation and its neighbors. The first method adopting this approach is the socalled locally weighted averages or disk smoother. In this method a rate for each observation is replaced by an average of rates for its neighbors. A spatial weights object is used to specify the neighborhood relationships among observations. To obtain locally weighted averages of the homicide rates in the counties surrounding St. Louis during 197984, we first read the corresponding data table and extract data values for the homicide counts (the 11th column) and total population (the 13th column):
>>> import pysal
>>> stl = pysal.open('../pysal/examples/stl_hom.csv', 'r')
>>> e, b = np.array(stl[:,10]), np.array(stl[:,13])
We then read the spatial weights file defining neighborhood relationships among the counties and ensure that the order of observations in the weights object is the same as that in the data table.
>>> w = pysal.open('../pysal/examples/stl.gal', 'r').read()
>>> if not w.id_order_set: w.id_order = range(1,len(stl) + 1)
Now we calculate locally weighted averages of the homicide rates.
>>> rate = sm.Disk_Smoother(e, b, w)
>>> rate.r
array([ 4.56502262e05, 3.44027685e05, 3.38280487e05,
4.78530468e05, 3.12278573e05, 2.22596997e05,
...
5.29577710e05, 5.51034691e05, 4.65160450e05,
5.32513363e05, 3.86199097e05, 1.92952422e05])
A variation of locally weighted averages is to use median instead of mean. In other words, the rate for an observation can be replaced by the median of the rates of its neighbors. This method is called locally weighted median and can be applied in the following way:
>>> rate = sm.Spatial_Median_Rate(e, b, w)
>>> rate.r
array([ 3.96047383e05, 3.55386859e05, 3.28308921e05,
4.30731238e05, 3.12453969e05, 1.97300409e05,
...
6.10668237e05, 5.86355507e05, 3.67396656e05,
4.82535850e05, 5.51831429e05, 2.99877050e05])
In this method the procedure to find local medians can be iterated until no further change occurs. The resulting local medians are called iteratively resmoothed medians.
>>> rate = sm.Spatial_Median_Rate(e, b, w, iteration=10)
>>> rate.r
array([ 3.10194715e05, 2.98419439e05, 3.10194715e05,
3.10159267e05, 2.99214885e05, 2.80530524e05,
...
3.81364519e05, 4.72176972e05, 3.75320135e05,
3.76863269e05, 4.72176972e05, 3.75320135e05])
The pure local medians can also be replaced by a weighted median. To obtain weighted medians, we need to create an array of weights. For example, we can use the total population of the counties as auxiliary weights:
>>> rate = sm.Spatial_Median_Rate(e, b, w, aw=b)
>>> rate.r
array([ 5.77412020e05, 4.46449551e05, 5.77412020e05,
5.77412020e05, 4.46449551e05, 3.61363528e05,
...
5.49703305e05, 5.86355507e05, 3.67396656e05,
3.67396656e05, 4.72176972e05, 2.99877050e05])
When obtaining locally weighted medians, we can consider only a specific subset of neighbors rather than all of them. A representative method following this approach is the headbanging smoother. In this method all areal units are represented by their geometric centroids. Among the neighbors of each observation, only near collinear points are considered for median search. Then, triples of points are selected from the near collinear points, and local medians are computed from the triples’ rates. [2] We apply this headbanging smoother to the rates of the deaths from Sudden Infant Death Syndrome (SIDS) for North Carolina counties during 197478. We first need to read the source data and extract the event counts (the 9th column) and population values (the 9th column). In this example the population values correspond to the numbers of live births during 197478.
>>> sids_db = pysal.open('../pysal/examples/sids2.dbf', 'r')
>>> e, b = np.array(sids_db[:,9]), np.array(sids_db[:,8])
Now we need to find triples for each observation. To support the search of triples, PySAL provides a class called Headbanging_Triples. This class requires an array of point observations, a spatial weights object, and the number of triples as its arguments:
>>> from pysal import knnW
>>> sids = pysal.open('../pysal/examples/sids2.shp', 'r')
>>> sids_d = np.array([i.centroid for i in sids])
>>> sids_w = knnW(sids_d,k=5)
>>> if not sids_w.id_order_set: sids_w.id_order = sids_w.id_order
>>> triples = sm.Headbanging_Triples(sids_d,sids_w,k=5)
The second line in the above example shows how to extract centroids of polygons. In this example we define 5 neighbors for each observation by using nearest neighbors criteria. In the last line we define the maximum number of triples to be found as 5.
Now we use the triples to compute the headbanging median rates:
>>> rate = sm.Headbanging_Median_Rate(e,b,triples)
>>> rate.r
array([ 0.00075586, 0. , 0.0008285 , 0.0018315 , 0.00498891,
0.00482094, 0.00133156, 0.0018315 , 0.00413223, 0.00142116,
...
0.00221541, 0.00354767, 0.00259903, 0.00392952, 0.00207125,
0.00392952, 0.00229253, 0.00392952, 0.00229253, 0.00229253])
As in the locally weighted medians, we can use a set of auxiliary weights and resmooth the medians iteratively.
Nonparametric Smoothing¶
Nonparametric smoothing methods compute rates without making any assumptions of distributional properties of rate estimates. A representative method in this approach is spatial filtering. PySAL provides the most simplistic form of spatial filtering where a userspecified grid is imposed on the data set and a moving window withi a fixed or adaptive radius visits each vertex of the grid to compute the rate at the vertex. Using the previous SIDS example, we can use Spatial_Filtering class:
>>> bbox = [sids.bbox[:2], sids.bbox[2:]]
>>> rate = sm.Spatial_Filtering(bbox, sids_d, e, b, 10, 10, r=1.5)
>>> rate.r
array([ 0.00152555, 0.00079271, 0.00161253, 0.00161253, 0.00139513,
0.00139513, 0.00139513, 0.00139513, 0.00139513, 0.00156348,
...
0.00240216, 0.00237389, 0.00240641, 0.00242211, 0.0024854 ,
0.00255477, 0.00266573, 0.00288918, 0.0028991 , 0.00293492])
The first and second arguments of the Spatial_Filtering class are a minimum bounding box containing the observations and a set of centroids representing the observations. Be careful that the bounding box is NOT the bounding box of the centroids. The fifth and sixth arguments are to specify the numbers of grid cells along x and y axes. The last argument, r, is to define the radius of the moving window. When this parameter is set, a fixed radius is applied to all grid vertices. To make the size of moving window variable, we can specify the minimum number of population in the moving window without specifying r:
>>> rate = sm.Spatial_Filtering(bbox, sids_d, e, b, 10, 10, pop=10000)
>>> rate.r
array([ 0.00157398, 0.00157398, 0.00157398, 0.00157398, 0.00166885,
0.00166885, 0.00166885, 0.00166885, 0.00166885, 0.00166885,
...
0.00202977, 0.00215322, 0.00207378, 0.00207378, 0.00217173,
0.00232408, 0.00222717, 0.00245399, 0.00267857, 0.00267857])
The spatial rate smoother is another nonparametric smoothing method that PySAL supports. This smoother is very similar to the locally weighted averages. In this method, however, the weighted sum is applied to event counts and population values separately. The resulting weighted sum of event counts is then divided by the counterpart of population values. To obtain neighbor information, we need to use a spatial weights matrix as before.
>>> rate = sm.Spatial_Rate(e, b, sids_w)
>>> rate.r
array([ 0.00114976, 0.00104622, 0.00110001, 0.00153257, 0.00399662,
0.00361428, 0.00146807, 0.00238521, 0.00288871, 0.00145228,
...
0.00240839, 0.00376101, 0.00244941, 0.0028813 , 0.00240839,
0.00261705, 0.00226554, 0.0031575 , 0.00254536, 0.0029003 ])
Another variation of spatial rate smoother is kernel smoother. PySAL supports kernel smoothing by using a kernel spatial weights instance in place of a general spatial weights object.
>>> from pysal import Kernel
>>> kw = Kernel(sids_d)
>>> if not kw.id_order_set: kw.id_order = range(0,len(sids_d))
>>> rate = sm.Kernel_Smoother(e, b, kw)
>>> rate.r
array([ 0.0009831 , 0.00104298, 0.00137113, 0.00166406, 0.00556741,
0.00442273, 0.00158202, 0.00243354, 0.00282158, 0.00099243,
...
0.00221017, 0.00328485, 0.00257988, 0.00370461, 0.0020566 ,
0.00378135, 0.00240358, 0.00432019, 0.00227857, 0.00251648])
Ageadjusted rate smoother is another nonparametric smoother that PySAL provides. This smoother applies direct age standardization while computing spatial rates. To illustrate the ageadjusted rate smoother, we create a new set of event counts and population values as well as a new kernel weights object.
>>> e = np.array([10, 8, 1, 4, 3, 5, 4, 3, 2, 1, 5, 3])
>>> b = np.array([100, 90, 15, 30, 25, 20, 30, 20, 80, 80, 90, 60])
>>> s = np.array([98, 88, 15, 29, 20, 23, 33, 25, 76, 80, 89, 66])
>>> points=[(10, 10), (20, 10), (40, 10), (15, 20), (30, 20), (30, 30)]
>>> kw=Kernel(points)
>>> if not kw.id_order_set: kw.id_order = range(0,len(points))
In the above example we created 6 observations each of which has two age groups. To apply ageadjusted rate smoothing, we use the Age_Adjusted_Smoother class as follows:
>>> rate = sm.Age_Adjusted_Smoother(e, b, kw, s)
>>> rate.r
array([ 0.10519625, 0.08494318, 0.06440072, 0.06898604, 0.06952076,
0.05020968])
Empirical Bayes Smoothers¶
The last group of smoothing methods that PySAL supports is based upon the Bayesian principle. These methods adjust a raw rate by taking into account information in the other raw rates. As a reference PySAL provides a method for aspatial Empirical Bayes smoothing:
>>> e, b = sm.sum_by_n(e, np.ones(12), 6), sm.sum_by_n(b, np.ones(12), 6)
>>> rate = sm.Empirical_Bayes(e, b)
>>> rate.r
array([ 0.09080775, 0.09252352, 0.12332267, 0.10753624, 0.03301368,
0.05934766])
In the first line of the above example we aggregate the event counts and population values by observation. Next we applied the Empirical_Bayes class to the aggregated counts and population values.
A spatial Empirical Bayes smoother is also implemented in PySAL. This method requires an additional argument, i.e., a spatial weights object. We continue to reuse the kernel spatial weights object we built before.
>>> rate = sm.Spatial_Empirical_Bayes(e, b, kw)
>>> rate.r
array([ 0.10105263, 0.10165261, 0.16104362, 0.11642038, 0.0226908 ,
0.05270639])
Excess Risk¶
Besides a variety of spatial smoothing methods, PySAL provides a class for estimating excess risk from event counts and population values. Excess risks are the ratios of observed event counts over expected event counts. An example for the class usage is as follows:
>>> risk = sm.Excess_Risk(e, b)
>>> risk.r
array([ 1.23737916, 1.45124717, 2.32199546, 1.82857143, 0.24489796,
0.69659864])
Further Information¶
For further details see the Smoothing API.
Footnotes
[1]  Although this tutorial provides an introduction to the PySAL implementations for spatial smoothing, it is not exhaustive. Complete documentation for the implementations can be found by accessing the help from within a Python interpreter. 
[2]  For the details of triple selection and headbanging smoothing please refer to Anselin, L., Lozano, N., and Koschinsky, J. (2006). “Rate Transformations and Smoothing”. GeoDa Center Research Report. 
Regionalization¶
Introduction¶
PySAL offers a number of tools for the construction of regions. For the purposes of this section, a “region” is a group of “areas,” and there are generally multiple regions in a particular dataset. At this time, PySAL offers the maxp regionalization algorithm and tools for constructing random regions.
maxp¶
Most regionalization algorithms require the user to define a priori the number of regions to be built (e.g. kmeans clustering). The maxp algorithm [1] determines the number of regions (p) endogenously based on a set of areas, a matrix of attributes on each area and a floor constraint. The floor constraint defines the minimum bound that a variable must reach for each region; for example, a constraint might be the minimum population each region must have. maxp further enforces a contiguity constraint on the areas within regions.
To illustrate this we will use data on per capita income from the lower 48 US states over the period 19292010. The goal is to form contiguous regions of states displaying similar levels of income throughout this period:
>>> import pysal
>>> import numpy as np
>>> import random
>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929, 2010)])
>>> pci = pci.transpose()
>>> pci.shape
(48, 81)
We also require set of binary contiguity weights for the Maxp class:
>>> w = pysal.open("../pysal/examples/states48.gal").read()
Once we have the attribute data and our weights object we can create an instance of Maxp:
>>> np.random.seed(100)
>>> random.seed(10)
>>> r = pysal.Maxp(w, pci, floor = 5, floor_variable = np.ones((48, 1)), initial = 99)
Here we are forming regions with a minimum of 5 states in each region, so we set the floor_variable to a simple unit vector to ensure this floor constraint is satisfied. We also specify the initial number of feasible solutions to 99  which are then searched over to pick the optimal feasible solution to then commence with the more expensive swapping component of the algorithm. [2]
The Maxp instance s has a number of attributes regarding the solution. First is the definition of the regions:
>>> r.regions
[['44', '34', '3', '25', '1', '4', '47'], ['12', '46', '20', '24', '13'], ['14', '45', '35', '30', '39'], ['6', '27', '17', '29', '5', '43'], ['33', '40', '28', '15', '41', '9', '23', '31', '38'], ['37', '8', '0', '7', '21', '2'], ['32', '19', '11', '10', '22'], ['16', '26', '42', '18', '36']]
which is a list of eight lists of region ids. For example, the first nested list indicates there are seven states in the first region, while the last region has five states. To determine which states these are we can read in the names from the original csv file:
>>> f.header
['Name', 'STATE_FIPS', '1929', '1930', '1931', '1932', '1933', '1934', '1935', '1936', '1937', '1938', '1939', '1940', '1941', '1942', '1943', '1944', '1945', '1946', '1947', '1948', '1949', '1950', '1951', '1952', '1953', '1954', '1955', '1956', '1957', '1958', '1959', '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009']
>>> names = f.by_col('Name')
>>> names = np.array(names)
>>> print names
['Alabama' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut'
'Delaware' 'Florida' 'Georgia' 'Idaho' 'Illinois' 'Indiana' 'Iowa'
'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts'
'Michigan' 'Minnesota' 'Mississippi' 'Missouri' 'Montana' 'Nebraska'
'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York'
'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma' 'Oregon' 'Pennsylvania'
'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah'
'Vermont' 'Virginia' 'Washington' 'West Virginia' 'Wisconsin' 'Wyoming']
and then loop over the region definitions to identify the specific states comprising each of the regions:
>>> for region in r.regions:
... ids = map(int,region)
... print names[ids]
...
['Washington' 'Oregon' 'California' 'Nevada' 'Arizona' 'Colorado' 'Wyoming']
['Iowa' 'Wisconsin' 'Minnesota' 'Nebraska' 'Kansas']
['Kentucky' 'West Virginia' 'Pennsylvania' 'North Carolina' 'Tennessee']
['Delaware' 'New Jersey' 'Maryland' 'New York' 'Connecticut' 'Virginia']
['Oklahoma' 'Texas' 'New Mexico' 'Louisiana' 'Utah' 'Idaho' 'Montana'
'North Dakota' 'South Dakota']
['South Carolina' 'Georgia' 'Alabama' 'Florida' 'Mississippi' 'Arkansas']
['Ohio' 'Michigan' 'Indiana' 'Illinois' 'Missouri']
['Maine' 'New Hampshire' 'Vermont' 'Massachusetts' 'Rhode Island']
We can evaluate our solution by developing a pseudo pvalue for the regionalization. This is done by comparing the within region sum of squares for the solution against simulated solutions where areas are randomly assigned to regions that maintain the cardinality of the original solution. This method must be explicitly called once the Maxp instance has been created:
>>> r.inference()
>>> r.pvalue
0.01
so we see we have a regionalization that is significantly different than a chance partitioning.
Random Regions¶
PySAL offers functionality to generate random regions based on userdefined constraints. There are three optional parameters to constrain the regionalization: number of regions, cardinality and contiguity. The default case simply takes a list of area IDs and randomly selects the number of regions and then allocates areas to each region. The user can also pass a vector of integers to the cardinality parameter to designate the number of areas to randomly assign to each region. The contiguity parameter takes a spatial weights object and uses that to ensure that each region is made up of spatially contiguous areas. When the contiguity constraint is enforced, it is possible to arrive at infeasible solutions; the maxiter parameter can be set to make multiple attempts to find a feasible solution. The following examples show some of the possible combinations of constraints.
>>> import random
>>> import numpy as np
>>> import pysal
>>> from pysal.region import Random_Region
>>> nregs = 13
>>> cards = range(2,14) + [10]
>>> w = pysal.lat2W(10,10,rook = False)
>>> ids = w.id_order
>>>
>>> # unconstrained
>>> random.seed(10)
>>> np.random.seed(10)
>>> t0 = Random_Region(ids)
>>> t0.regions[0]
[19, 14, 43, 37, 66, 3, 79, 41, 38, 68, 2, 1, 60]
>>> # cardinality and contiguity constrained (num_regions implied)
>>> random.seed(60)
>>> np.random.seed(60)
>>> t1 = pysal.region.Random_Region(ids, num_regions = nregs, cardinality = cards, contiguity = w)
>>> t1.regions[0]
[88, 97, 98, 89, 99, 86, 78, 59, 49, 69, 68, 79, 77]
>>> # cardinality constrained (num_regions implied)
>>> random.seed(100)
>>> np.random.seed(100)
>>> t2 = Random_Region(ids, num_regions = nregs, cardinality = cards)
>>> t2.regions[0]
[37, 62]
>>> # number of regions and contiguity constrained
>>> random.seed(100)
>>> np.random.seed(100)
>>> t3 = Random_Region(ids, num_regions = nregs, contiguity = w)
>>> t3.regions[1]
[71, 72, 70, 93, 51, 91, 85, 74, 63, 73, 61, 62, 82]
>>> # cardinality and contiguity constrained
>>> random.seed(60)
>>> np.random.seed(60)
>>> t4 = Random_Region(ids, cardinality = cards, contiguity = w)
>>> t4.regions[0]
[88, 97, 98, 89, 99, 86, 78, 59, 49, 69, 68, 79, 77]
>>> # number of regions constrained
>>> random.seed(100)
>>> np.random.seed(100)
>>> t5 = Random_Region(ids, num_regions = nregs)
>>> t5.regions[0]
[37, 62, 26, 41, 35, 25, 36]
>>> # cardinality constrained
>>> random.seed(100)
>>> np.random.seed(100)
>>> t6 = Random_Region(ids, cardinality = cards)
>>> t6.regions[0]
[37, 62]
>>> # contiguity constrained
>>> random.seed(100)
>>> np.random.seed(100)
>>> t7 = Random_Region(ids, contiguity = w)
>>> t7.regions[0]
[37, 27, 36, 17]
>>>
Further Information¶
For further details see the Regionalization API.
Footnotes
[1]  Duque, J. C., L. Anselin and S. J. Rey. 2011. “The maxpregions problem.” Journal of Regional Science DOI: 10.1111/j.14679787.2011.00743.x 
[2]  Because this is a randomized algorithm, results may vary when replicating this example. To reproduce a regionalization solution, you should first set the random seed generator. See http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html for more information. 
Spatial Dynamics¶
Contents
Introduction¶
PySAL implements a number of exploratory approaches to analyze the dynamics of longitudinal spatial data, or observations on fixed areal units over multiple time periods. Examples could include time series of voting patterns in US Presidential elections, time series of remote sensing images, labor market dynamics, regional business cycles, among many others. Two broad sets of spatial dynamics methods are implemented to analyze these data types. The first are Markov based methods, while the second are based on Rank dynamics.
Additionally, methods are included in this module to analyze patterns of individual events which have spatial and temporal coordinates associated with them. Examples include locations and times of individual cases of disease or crimes. Methods are included here to determine if these event patterns exhibit spacetime interaction.
Markov Based Methods¶
The Markov based methods include classic Markov chains and extensions of these approaches to deal with spatially referenced data. In what follows we illustrate the functionality of these Markov methods. Readers interested in the methodological foundations of these approaches are directed to [1].
Classic Markov¶
We start with a look at a simple example of classic Markov methods implemented in PySAL. A Markov chain may be in one of different states at any point in time. These states are exhaustive and mutually exclusive. For example, if one had a time series of remote sensing images used to develop land use classifications, then the states could be defined as the specific land use classes and interest would center on the transitions in and out of different classes for each pixel.
For example, let’s construct a small artificial chain consisting of 3 states (a,b,c) and 5 different pixels at three different points in time:
>>> import pysal
>>> import numpy as np
>>> c = np.array([['b','a','c'],['c','c','a'],['c','b','c'],['a','a','b'],['a','b','c']])
>>> c
array([['b', 'a', 'c'],
['c', 'c', 'a'],
['c', 'b', 'c'],
['a', 'a', 'b'],
['a', 'b', 'c']],
dtype='S1')
So the first pixel was in class ‘b’ in period 1, class ‘a’ in period 2, and class ‘c’ in period 3. We can summarize the overall transition dynamics for the set of pixels by treating it as a Markov chain:
>>> m = pysal.Markov(c)
>>> m.classes
array(['a', 'b', 'c'],
dtype='S1')
The Markov instance m has an attribute class extracted from the chain  the assumption is that the observations are on the rows of the input and the different points in time on the columns. In addition to extracting the classes as an attribute, our Markov instance will also have a transitions matrix:
>>> m.transitions
array([[ 1., 2., 1.],
[ 1., 0., 2.],
[ 1., 1., 1.]])
indicating that of the four pixels that began a transition interval in class ‘a’, 1 remained in that class, 2 transitioned to class ‘b’ and 1 transitioned to class ‘c’.
This simple example illustrates the basic creation of a Markov instance, but the small sample size makes it unrealistic for the more advanced features of this approach. For a larger example, we will look at an application of Markov methods to understanding regional income dynamics in the US. Here we will load in data on per capita income observed annually from 1929 to 2010 for the lower 48 US states:
>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929,2010)])
>>> pci.shape
(81, 48)
The first row of the array is the per capita income for the first year:
>>> pci[0, :]
array([ 323, 600, 310, 991, 634, 1024, 1032, 518, 347, 507, 948,
607, 581, 532, 393, 414, 601, 768, 906, 790, 599, 286,
621, 592, 596, 868, 686, 918, 410, 1152, 332, 382, 771,
455, 668, 772, 874, 271, 426, 378, 479, 551, 634, 434,
741, 460, 673, 675])
In order to apply the classic Markov approach to this series, we first have to discretize the distribution by defining our classes. There are many ways to do this, but here we will use the quintiles for each annual income distribution to define the classes:
>>> q5 = np.array([pysal.Quantiles(y).yb for y in pci]).transpose()
>>> q5.shape
(48, 81)
>>> q5[:, 0]
array([0, 2, 0, 4, 2, 4, 4, 1, 0, 1, 4, 2, 2, 1, 0, 1, 2, 3, 4, 4, 2, 0, 2,
2, 2, 4, 3, 4, 0, 4, 0, 0, 3, 1, 3, 3, 4, 0, 1, 0, 1, 2, 2, 1, 3, 1,
3, 3])
A number of things need to be noted here. First, we are relying on the classification methods in PySAL for defining our quintiles. The class Quantiles uses quintiles as the default and will create an instance of this class that has multiple attributes, the one we are extracting in the first line is yb  the class id for each observation. The second thing to note is the transpose operator which gets our resulting array q5 in the proper structure required for use of Markov. Thus we see that the first spatial unit (Alabama with an income of 323) fell in the first quintile in 1929, while the last unit (Wyoming with an income of 675) fell in the fourth quintile [2].
So now we have a time series for each state of its quintile membership. For example, Colorado’s quintile time series is:
>>> q5[4, :]
array([2, 3, 2, 2, 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 3,
3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4,
4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 3, 3])
indicating that it has occupied the 3rd, 4th and 5th quintiles in the distribution at different points in time. To summarize the transition dynamics for all units, we instantiate a Markov object:
>>> m5 = pysal.Markov(q5)
>>> m5.transitions
array([[ 729., 71., 1., 0., 0.],
[ 72., 567., 80., 3., 0.],
[ 0., 81., 631., 86., 2.],
[ 0., 3., 86., 573., 56.],
[ 0., 0., 1., 57., 741.]])
Assuming we can treat these transitions as a first order Markov chain, we can estimate the transition probabilities:
>>> m5.p
matrix([[ 0.91011236, 0.0886392 , 0.00124844, 0. , 0. ],
[ 0.09972299, 0.78531856, 0.11080332, 0.00415512, 0. ],
[ 0. , 0.10125 , 0.78875 , 0.1075 , 0.0025 ],
[ 0. , 0.00417827, 0.11977716, 0.79805014, 0.07799443],
[ 0. , 0. , 0.00125156, 0.07133917, 0.92740926]])
as well as the long run steady state distribution:
>>> m5.steady_state
matrix([[ 0.20774716],
[ 0.18725774],
[ 0.20740537],
[ 0.18821787],
[ 0.20937187]])
With the transition probability matrix in hand, we can estimate the first mean passage time:
>>> pysal.ergodic.fmpt(m5.p)
matrix([[ 4.81354357, 11.50292712, 29.60921231, 53.38594954,
103.59816743],
[ 42.04774505, 5.34023324, 18.74455332, 42.50023268,
92.71316899],
[ 69.25849753, 27.21075248, 4.82147603, 25.27184624,
75.43305672],
[ 84.90689329, 42.85914824, 17.18082642, 5.31299186,
51.60953369],
[ 98.41295543, 56.36521038, 30.66046735, 14.21158356,
4.77619083]])
Thus, for a state with income in the first quintile, it takes on average 11.5 years for it to first enter the second quintile, 29.6 to get to the third quintile, 53.4 years to enter the fourth, and 103.6 years to reach the richest quintile.
Spatial Markov¶
Thus far we have treated all the spatial units as independent to estimate the transition probabilities. This hides a number of implicit assumptions. First, the transition dynamics are assumed to hold for all units and for all time periods. Second, interactions between the transitions of individual units are ignored. In other words regional context may be important to understand regional income dynamics, but the classic Markov approach is silent on this issue.
PySAL includes a number of spatially explicit extensions to the Markov framework. The first is the spatial Markov class that we illustrate here. We first are going to transform the income series to relative incomes (by standardizing by each period by the mean):
>>> import pysal
>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929, 2010)])
>>> pci = pci.transpose()
>>> rpci = pci / (pci.mean(axis = 0))
Next, we require a spatial weights object, and here we will create one from an external GAL file:
>>> w = pysal.open("../pysal/examples/states48.gal").read()
>>> w.transform = 'r'
Finally, we create an instance of the Spatial Markov class using 5 states for the chain:
>>> sm = pysal.Spatial_Markov(rpci, w, fixed = True, k = 5)
Here we are keeping the quintiles fixed, meaning the data are pooled over space and time and the quintiles calculated for the pooled data. This is why we first transformed the data to relative incomes. We can next examine the global transition probability matrix for relative incomes:
>>> sm.p
matrix([[ 0.91461837, 0.07503234, 0.00905563, 0.00129366, 0. ],
[ 0.06570302, 0.82654402, 0.10512484, 0.00131406, 0.00131406],
[ 0.00520833, 0.10286458, 0.79427083, 0.09505208, 0.00260417],
[ 0. , 0.00913838, 0.09399478, 0.84856397, 0.04830287],
[ 0. , 0. , 0. , 0.06217617, 0.93782383]])
The Spatial Markov allows us to compare the global transition dynamics to those conditioned on regional context. More specifically, the transition dynamics are split across economies who have spatial lags in different quintiles at the beginning of the year. In our example we have 5 classes, so 5 different conditioned transition probability matrices are estimated:
>>> for p in sm.P:
... print p
...
[[ 0.96341463 0.0304878 0.00609756 0. 0. ]
[ 0.06040268 0.83221477 0.10738255 0. 0. ]
[ 0. 0.14 0.74 0.12 0. ]
[ 0. 0.03571429 0.32142857 0.57142857 0.07142857]
[ 0. 0. 0. 0.16666667 0.83333333]]
[[ 0.79831933 0.16806723 0.03361345 0. 0. ]
[ 0.0754717 0.88207547 0.04245283 0. 0. ]
[ 0.00537634 0.06989247 0.8655914 0.05913978 0. ]
[ 0. 0. 0.06372549 0.90196078 0.03431373]
[ 0. 0. 0. 0.19444444 0.80555556]]
[[ 0.84693878 0.15306122 0. 0. 0. ]
[ 0.08133971 0.78947368 0.1291866 0. 0. ]
[ 0.00518135 0.0984456 0.79274611 0.0984456 0.00518135]
[ 0. 0. 0.09411765 0.87058824 0.03529412]
[ 0. 0. 0. 0.10204082 0.89795918]]
[[ 0.8852459 0.09836066 0. 0.01639344 0. ]
[ 0.03875969 0.81395349 0.13953488 0. 0.00775194]
[ 0.0049505 0.09405941 0.77722772 0.11881188 0.0049505 ]
[ 0. 0.02339181 0.12865497 0.75438596 0.09356725]
[ 0. 0. 0. 0.09661836 0.90338164]]
[[ 0.33333333 0.66666667 0. 0. 0. ]
[ 0.0483871 0.77419355 0.16129032 0.01612903 0. ]
[ 0.01149425 0.16091954 0.74712644 0.08045977 0. ]
[ 0. 0.01036269 0.06217617 0.89637306 0.03108808]
[ 0. 0. 0. 0.02352941 0.97647059]]
The probability of a poor state remaining poor is 0.963 if their neighbors are in the 1st quintile and 0.798 if their neighbors are in the 2nd quintile. The probability of a rich economy remaining rich is 0.977 if their neighbors are in the 5th quintile, but if their neighbors are in the 4th quintile this drops to 0.903.
We can also explore the different steady state distributions implied by these different transition probabilities:
>>> sm.S
array([[ 0.43509425, 0.2635327 , 0.20363044, 0.06841983, 0.02932278],
[ 0.13391287, 0.33993305, 0.25153036, 0.23343016, 0.04119356],
[ 0.12124869, 0.21137444, 0.2635101 , 0.29013417, 0.1137326 ],
[ 0.0776413 , 0.19748806, 0.25352636, 0.22480415, 0.24654013],
[ 0.01776781, 0.19964349, 0.19009833, 0.25524697, 0.3372434 ]])
The long run distribution for states with poor (rich) neighbors has 0.435 (0.018) of the values in the first quintile, 0.263 (0.200) in the second quintile, 0.204 (0.190) in the third, 0.0684 (0.255) in the fourth and 0.029 (0.337) in the fifth quintile. And, finally the first mean passage times:
>>> for f in sm.F:
... print f
...
[[ 2.29835259 28.95614035 46.14285714 80.80952381 279.42857143]
[ 33.86549708 3.79459555 22.57142857 57.23809524 255.85714286]
[ 43.60233918 9.73684211 4.91085714 34.66666667 233.28571429]
[ 46.62865497 12.76315789 6.25714286 14.61564626 198.61904762]
[ 52.62865497 18.76315789 12.25714286 6. 34.1031746 ]]
[[ 7.46754205 9.70574606 25.76785714 74.53116883 194.23446197]
[ 27.76691978 2.94175577 24.97142857 73.73474026 193.4380334 ]
[ 53.57477715 28.48447637 3.97566318 48.76331169 168.46660482]
[ 72.03631562 46.94601483 18.46153846 4.28393653 119.70329314]
[ 77.17917276 52.08887197 23.6043956 5.14285714 24.27564033]]
[[ 8.24751154 6.53333333 18.38765432 40.70864198 112.76732026]
[ 47.35040872 4.73094099 11.85432099 34.17530864 106.23398693]
[ 69.42288828 24.76666667 3.794921 22.32098765 94.37966594]
[ 83.72288828 39.06666667 14.3 3.44668119 76.36702977]
[ 93.52288828 48.86666667 24.1 9.8 8.79255406]]
[[ 12.87974382 13.34847151 19.83446328 28.47257282 55.82395142]
[ 99.46114206 5.06359731 10.54545198 23.05133495 49.68944423]
[ 117.76777159 23.03735526 3.94436301 15.0843986 43.57927247]
[ 127.89752089 32.4393006 14.56853107 4.44831643 31.63099455]
[ 138.24752089 42.7893006 24.91853107 10.35 4.05613474]]
[[ 56.2815534 1.5 10.57236842 27.02173913 110.54347826]
[ 82.9223301 5.00892857 9.07236842 25.52173913 109.04347826]
[ 97.17718447 19.53125 5.26043557 21.42391304 104.94565217]
[ 127.1407767 48.74107143 33.29605263 3.91777427 83.52173913]
[ 169.6407767 91.24107143 75.79605263 42.5 2.96521739]]
States with incomes in the first quintile with neighbors in the first quintile return to the first quintile after 2.298 years, after leaving the first quintile. They enter the fourth quintile 80.810 years after leaving the first quintile, on average. Poor states within neighbors in the fourth quintile return to the first quintile, on average, after 12.88 years, and would enter the fourth quintile after 28.473 years.
LISA Markov¶
The Spatial Markov conditions the transitions on the value of the spatial lag for an observation at the beginning of the transition period. An alternative approach to spatial dynamics is to consider the joint transitions of an observation and its spatial lag in the distribution. By exploiting the form of the static LISA and embedding it in a dynamic context we develop the LISA Markov in which the states of the chain are defined as the four quadrants in the Moran scatter plot. Continuing on with our US example:
>>> import numpy as np
>>> f = pysal.open("../pysal/examples/usjoin.csv")
>>> pci = np.array([f.by_col[str(y)] for y in range(1929, 2010)]).transpose()
>>> w = pysal.open("../pysal/examples/states48.gal").read()
>>> lm = pysal.LISA_Markov(pci, w)
>>> lm.classes
array([1, 2, 3, 4])
The LISA transitions are:
>>> lm.transitions
array([[ 1.08700000e+03, 4.40000000e+01, 4.00000000e+00,
3.40000000e+01],
[ 4.10000000e+01, 4.70000000e+02, 3.60000000e+01,
1.00000000e+00],
[ 5.00000000e+00, 3.40000000e+01, 1.42200000e+03,
3.90000000e+01],
[ 3.00000000e+01, 1.00000000e+00, 4.00000000e+01,
5.52000000e+02]])
and the estimated transition probability matrix is:
>>> lm.p
matrix([[ 0.92985458, 0.03763901, 0.00342173, 0.02908469],
[ 0.07481752, 0.85766423, 0.06569343, 0.00182482],
[ 0.00333333, 0.02266667, 0.948 , 0.026 ],
[ 0.04815409, 0.00160514, 0.06420546, 0.88603531]])
The diagonal elements indicate the staying probabilities and we see that there is greater mobility for observations in quadrants 1 and 3 than 2 and 4.
The implied long run steady state distribution of the chain is
>>> lm.steady_state
matrix([[ 0.28561505],
[ 0.14190226],
[ 0.40493672],
[ 0.16754598]])
again reflecting the dominance of quadrants 1 and 3 (positive autocorrelation). [3] Finally the first mean passage time for the LISAs is:
>>> pysal.ergodic.fmpt(lm.p)
matrix([[ 3.50121609, 37.93025465, 40.55772829, 43.17412009],
[ 31.72800152, 7.04710419, 28.68182751, 49.91485137],
[ 52.44489385, 47.42097495, 2.46952168, 43.75609676],
[ 38.76794022, 51.51755827, 26.31568558, 5.96851095]])
Rank Based Methods¶
The second set of spatial dynamic methods in PySAL are based on rank correlations and spatial extensions of the classic rank statistics.
Spatial Rank Correlation¶
Kendall’s is based on a comparison of the number of pairs of observations that have concordant ranks between two variables. For spatial dynamics in PySAL, the two variables in question are the values of an attribute measured at two points in time over spatial units. This classic measure of rank correlation indicates how much relative stability there has been in the map pattern over the two periods.
The spatial decomposes these pairs into those that are spatial neighbors and those that are not, and examines whether the rank correlation is different between the two sets. [4] To illustrate this we turn to the case of regional incomes in Mexico over the 1940 to 2010 period:
>>> import pysal
>>> f = pysal.open("../pysal/examples/mexico.csv")
>>> vnames = ["pcgdp%d"%dec for dec in range(1940, 2010, 10)]
>>> y = np.transpose(np.array([f.by_col[v] for v in vnames]))
We also introduce the concept of regime weights that defines the neighbor set as those spatial units belonging to the same region. In this example the variable “esquivel99” represents a categorical classification of Mexican states into regions:
>>> regime = np.array(f.by_col['esquivel99'])
>>> w = pysal.weights.block_weights(regime)
>>> np.random.seed(12345)
Now we will calculate the spatial tau for decade transitions from 1940 through 2000 and report the observed spatial tau against that expected if the rank changes were randomly distributed in space by using 99 permutations:
>>> res=[pysal.SpatialTau(y[:,i],y[:,i+1],w,99) for i in range(6)]
>>> for r in res:
... ev = r.taus.mean()
... "%8.3f %8.3f %8.3f"%(r.tau_spatial, ev, r.tau_spatial_psim)
...
' 0.397 0.659 0.010'
' 0.492 0.706 0.010'
' 0.651 0.772 0.020'
' 0.714 0.752 0.210'
' 0.683 0.705 0.270'
' 0.810 0.819 0.280'
The observed level of spatial concordance during the 194050 transition was 0.397 which is significantly lower (p=0.010) than the average level of spatial concordance (0.659) from randomly permuted incomes in Mexico. Similar patterns are found for the next two transition periods as well. In other words the amount of rank concordance is significantly distinct between pairs of observations that are geographical neighbors and those that are not in these first three transition periods. This reflects the greater degree of spatial similarity within rather than between the regimes making the discordant pairs dominated by neighboring pairs.
Rank Decomposition¶
For a sequence of time periods, measures the extent to which rank changes for a variable measured over locations are in the same direction within mutually exclusive and exhaustive partitions (regimes) of the locations.
Theta is defined as the sum of the absolute sum of rank changes within the regimes over the sum of all absolute rank changes. [4]
>>> import pysal
>>> f = pysal.open("../pysal/examples/mexico.csv")
>>> vnames = ["pcgdp%d"%dec for dec in range(1940, 2010, 10)]
>>> y = np.transpose(np.array([f.by_col[v] for v in vnames]))
>>> regime = np.array(f.by_col['esquivel99'])
>>> np.random.seed(10)
>>> t = pysal.Theta(y, regime, 999)
>>> t.theta
array([[ 0.41538462, 0.28070175, 0.61363636, 0.62222222, 0.33333333,
0.47222222]])
>>> t.pvalue_left
array([ 0.307, 0.077, 0.823, 0.552, 0.045, 0.735])
SpaceTime Interaction Tests¶
The third set of spatial dynamic methods in PySAL are global tests of spacetime interaction. The purpose of these tests is to detect clustering within spacetime event patterns. These patterns are composed of unique events that are labeled with spatial and temporal coordinates. The tests are designed to detect clustering of events in both space and time beyond “any purely spatial or purely temporal clustering” [5], that is, to determine if the events are “interacting.” Essentially, the tests examine the dataset to determine if pairs of events closest to each other in space are also those closest to each other in time. The null hypothesis of these tests is that the examined events are distributed randomly in space and time, i.e. the distance between pairs of events in space is independent of the distance in time. Three tests are currently implemented in PySAL: the Knox test, the Mantel test and the Jacquez Nearest Neighbors test. These tests have been widely applied in epidemiology, criminology and biology. A more indepth technical review of these methods is available in [6].
Knox Test¶
The Knox test for spacetime interaction employs userdefined critical thresholds in space and time to define proximity between events. All pairs of events are examined to determine if the distance between them in space and time is within the respective thresholds. The Knox statistic is calculated as the total number of event pairs where the spatial and temporal distances separating the pair are within the specified thresholds [7]. If interaction is present, the test statistic will be large. Significance is traditionally established using a Monte Carlo permuation method where event timestamps are permuted and the statistic is recalculated. This procedure is repeated to generate a distribution of statistics which is used to establish the pseudosignificance of the observed test statistic. This approach assumes a static underlying population from which events are drawn. If this is not the case the results may be biased [8].
Formally, the specification of the Knox test is given as:
Where = number of events, = adjacency in space, = adjacency in time, = distance in space, and = distance in time. Critical space and time distance thresholds are defined as and , respectively.
We illustrate the use of the Knox test using data from a study of Burkitt’s Lymphoma in Uganda during the period 196175 [9]. We start by importing Numpy, PySAL and the interaction module:
>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
The example data are then read in and used to create an instance of SpaceTimeEvents. This reformats the data so the test can be run by PySAL. This class requires the input of a point shapefile. The shapefile must contain a column that includes a timestamp for each point in the dataset. The class requires that the user input a path to an appropriate shapefile and the name of the column containing the timestamp. In this example, the appropriate column name is ‘T’.
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')
Next, we run the Knox test with distance and time thresholds of 20 and 5,respectively. This counts the events that are closer than 20 units in space, and 5 units in time.
>>> result = interaction.knox(events.space, events.t ,delta=20,tau=5,permutations=99)
Finally we examine the results. We call the statistic from the results dictionary. This reports that there are 13 events close in both space and time, based on our threshold definitions.
>>> print(result['stat'])
13
Then we look at the pseudosignificance of this value, calculated by permuting the timestamps and rerunning the statistics. Here, 99 permutations were used, but an alternative number can be specified by the user. In this case, the results indicate that we fail to reject the null hypothesis of no spacetime interaction using an alpha value of 0.05.
>>> print("%2.2f"%result['pvalue'])
0.17
Modified Knox Test¶
A modification to the Knox test was proposed by Baker [10]. Baker’s modification measures the difference between the original observed Knox statistic and its expected value. This difference serves as the test statistic. Again, the significance of this statistic is assessed using a Monte Carlo permutation procedure.
Where = number of events, = adjacency in space, = adjacency in time (calculated in a manner equivalent to and above in the Knox test). The first part of this statistic is equivalent to the original Knox test, while the second part is the expected value under spatiotemporal randomness.
Here we illustrate the use of the modified Knox test using the data on Burkitt’s Lymphoma cases in Uganda from above. We start by importing Numpy, PySAL and the interaction module. Next the example data are then read in and used to create an instance of SpaceTimeEvents.
>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')
Next, we run the modified Knox test with distance and time thresholds of 20 and 5,respectively. This counts the events that are closer than 20 units in space, and 5 units in time.
>>> result = interaction.modified_knox(events.space, events.t,delta=20,tau=5,permutations=99)
Finally we examine the results. We call the statistic from the results dictionary. This reports a statistic value of 2.810160.
>>> print("%2.8f"%result['stat'])
2.81016043
Next we look at the pseudosignificance of this value, calculated by permuting the timestamps and rerunning the statistics. Here, 99 permutations were used, but an alternative number can be specified by the user. In this case, the results indicate that we fail to reject the null hypothesis of no spacetime interaction using an alpha value of 0.05.
>>> print("%2.2f"%result['pvalue'])
0.11
Mantel Test¶
Akin to the Knox test in its simplicity, the Mantel test keeps the distance information discarded by the Knox test. The unstandardized Mantel statistic is calculated by summing the product of the spatial and temporal distances between all event pairs [11]. To prevent multiplication by 0 in instances of colocated or simultaneous events, Mantel proposed adding a constant to the distance measurements. Additionally, he suggested a reciprocal transform of the resulting distance measurement to lessen the effect of the larger distances on the product sum. The test is defined formally below:
Where, again, and denote distance in space and time, respectively. The constant, , and the power, , are parameters set by the user. The default values are 0 and 1, respectively. A standardized version of the Mantel test is implemented here in PySAL, however. The standardized statistic () is a measure of correlation between the spatial and temporal distance matrices. This is expressed formally as:
Where refers to the average distance in space, and the average distance in time. For notational convenience and refer to the sample (not population) standard deviations, for distance in space and time, respectively. The same constant and power transformations may also be applied to the spatial and temporal distance matrices employed by the standardized Mantel. Significance is determined through a Monte Carlo permuation approach similar to that employed in the Knox test.
Again, we use the Burkitt’s Lymphoma data to illustrate the test. We start with the usual imports and read in the example data.
>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')
The following example runs the standardized Mantel test with constants of 0 and transformations of 1, meaning the distance matrices will remain unchanged; however, as recommended by Mantel, a small constant should be added and an inverse transformation (i.e. 1) specified.
>>> result = interaction.mantel(events.space, events.t,99,scon=0.0,spow=1.0,tcon=0.0,tpow=1.0)
Next, we examine the result of the test.
>>> print("%6.6f"%result['stat'])
0.014154
Finally, we look at the pseudosignificance of this value, calculated by permuting the timestamps and rerunning the statistic for each of the 99 permuatations. Again, note, the number of permutations can be changed by the user. According to these parameters, the results fail to reject the null hypothesis of no spacetime interaction between the events.
>>> print("%2.2f"%result['pvalue'])
0.27
Jacquez Test¶
Instead of using a set distance in space and time to determine proximity (like the Knox test) the Jacquez test employs a nearest neighbor distance approach. This allows the test to account for changes in underlying population density. The statistic is calculated as the number of event pairs that are within the set of nearest neighbors for each other in both space and time [12]. Significance of this count is established using a Monte Carlo permutation method. The test is expressed formally as:
Where = number of cases; = adjacency in space; = adjacency in time. To illustrate the test, the Burkitt’s Lymphoma data are employed again. We start with the usual imports and read in the example data.
>>> import numpy as np
>>> import pysal
>>> import pysal.spatial_dynamics.interaction as interaction
>>> np.random.seed(100)
>>> path = "../pysal/examples/burkitt"
>>> events = interaction.SpaceTimeEvents(path,'T')
The following runs the Jacquez test on the example data for a value of = 3 and reports the resulting statistic. In this case, there are 13 instances where events are nearest neighbors in both space and time. The significance of this can be assessed by calling the pvalue from the results dictionary. Again, there is not enough evidence to reject the null hypothesis of no spacetime interaction.
>>> result = interaction.jacquez(events.space, events.t ,k=3,permutations=99)
>>> print result['stat']
13
>>> print "%3.1f"%result['pvalue']
0.2
Spatial Dynamics API¶
For further details see the Spatial Dynamics API.
Footnotes
[1]  Rey, S.J. 2001. “Spatial empirics for economic growth and convergence”, 34 Geographical Analysis, 33, 195214. 
[2]  The states are ordered alphabetically. 
[3]  The complex values of the steady state distribution arise from complex eigenvalues in the transition probability matrix which may indicate cyclicality in the chain. 
[4]  Rey, S.J. (2004) “Spatial dependence in the evolution of regional income distributions,” in A. Getis, J. Mur and H.Zoeller (eds). Spatial Econometrics and Spatial Statistics. Palgrave, London, pp. 194213. 
[5]  Kulldorff, M. (1998). Statistical methods for spatial epidemiology: tests for randomness. In Gatrell, A. and Loytonen, M., editors, GIS and Health, pages 49–62. Taylor & Francis, London. 
[6]  Tango, T. (2010). Statistical Methods for Disease Clustering. Springer, New York. 
[7]  Knox, E. (1964). The detection of spacetime interactions. Journal of the Royal Statistical Society. Series C (Applied Statistics), 13(1):25–30. 
[8]  R.D. Baker. (2004). Identifying spacetime disease clusters. Acta Tropica, 91(3):291299. 
[9]  Kulldorff, M. and Hjalmars, U. (1999). The Knox method and other tests for space time interaction. Biometrics, 55(2):544–552. 
[10]  Williams, E., Smith, P., Day, N., Geser, A., Ellice, J., and Tukei, P. (1978). Spacetime clustering of Burkitt’s lymphoma in the West Nile district of Uganda: 19611975. British Journal of Cancer, 37(1):109. 
[11]  Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27(2):209–220. 
[12]  Jacquez, G. (1996). A k nearest neighbour test for spacetime interaction. Statistics in Medicine, 15(18):1935–1949. 
Using PySAL with Shapely for GIS Operations¶
New in version 1.3.
Introduction¶
The Shapely project is a BSDlicensed Python package for manipulation and analysis of planar geometric objects, and depends on the widely used GEOS library.
PySAL supports interoperation with the Shapely library through Shapely’s Python
Geo Interface. All PySAL geometries provide a __geo_interface__ property which
models the geometries as a GeoJSON object. Shapely geometry objects also export
the __geo_interface__ property and can be adapted to PySAL geometries using
the pysal.cg.asShape
function.
Additionally, PySAL provides an optional contrib module that handles the
conversion between pysal and shapely data strucutures for you. The module can
be found in at, pysal.contrib.shapely_ext
.
Installation¶
Please refer to the Shapely website for instructions on installing Shapely and its dependencies, without which PySAL’s Shapely extension will not work.
Usage¶
Using the Python Geo Interface...
>>> import pysal
>>> import shapely.geometry
>>> # The get_path function returns the absolute system path to pysal's
>>> # included example files no matter where they are installed on the system.
>>> fpath = pysal.examples.get_path('stl_hom.shp')
>>> # Now, open the shapefile using pysal's FileIO
>>> shps = pysal.open(fpath , 'r')
>>> # We can read a polygon...
>>> polygon = shps.next()
>>> # To use this polygon with shapely we simply convert it with
>>> # Shapely's asShape method.
>>> polygon = shapely.geometry.asShape(polygon)
>>> # now we can operate on our polygons like normal shapely objects...
>>> print "%.4f"%polygon.area
0.1701
>>> # We can do things like buffering...
>>> eroded_polygon = polygon.buffer(0.01)
>>> print "%.4f"%eroded_polygon.area
0.1533
>>> # and containment testing...
>>> polygon.contains(eroded_polygon)
True
>>> eroded_polygon.contains(polygon)
False
>>> # To go back to pysal shapes we call pysal.cg.asShape...
>>> eroded_polygon = pysal.cg.asShape(eroded_polygon)
>>> type(eroded_polygon)
<class 'pysal.cg.shapes.Polygon'>
Using The PySAL shapely_ext module...
>>> import pysal
>>> from pysal.contrib import shapely_ext
>>> fpath = pysal.examples.get_path('stl_hom.shp')
>>> shps = pysal.open(fpath , 'r')
>>> polygon = shps.next()
>>> eroded_polygon = shapely_ext.buffer(polygon, 0.01)
>>> print "%0.4f"%eroded_polygon.area
0.1533
>>> shapely_ext.contains(polygon,eroded_polygon)
True
>>> shapely_ext.contains(eroded_polygon,polygon)
False
>>> type(eroded_polygon)
<class 'pysal.cg.shapes.Polygon'>
PySAL: Example Data Sets¶
PySAL comes with a number of example data sets that are used in some of the documentation strings in the source code. All the example data sets can be found in the examples directory.
10740¶
Polygon shapefile for Albuquerque New Mexico.
 10740.dbf: attribute database file
 10740.shp: shapefile
 10740.shx: spatial index
 10740_queen.gal: queen contiguity GAL format
 10740_rook.gal: rook contiguity GAL format
book¶
Synthetic data to illustrate spatial weights. Source: Anselin, L. and S.J. Rey (in progress) Spatial Econometrics: Foundations.
 book.gal: rook contiguity for regular lattice
 book.txt: attribute data for regular lattice
calempdensity¶
Employment density for California counties. Source: Anselin, L. and S.J. Rey (in progress) Spatial Econometrics: Foundations.
 calempdensity.csv: data on employment and employment density in California counties.
chicago77¶
Chicago Community Areas (n=77). Source: Anselin, L. and S.J. Rey (in progress) Spatial Econometrics: Foundations.
 Chicago77.dbf: attribute data
 Chicago77.shp: shapefile
 Chicago77.shx: spatial index
desmith¶
Example data for autocorrelation analysis. Source: de Smith et al (2009) Geospatial Analysis (Used with permission)
 desmith.txt: attribute data for 10 spatial units
 desmith.gal: spatial weights in GAL format
juvenile¶
Cardiff juvenile delinquent residences.
 juvenile.dbf: attribute data
 juvenile.html: documentation
 juvenile.shp: shapefile
 juvenile.shx: spatial index
 juvenile.gwt: spatial weights in GWT format
mexico¶
State regional income Mexican states 19402000. Source: Rey, S.J. and M.L. Sastre Gutierrez. “Interregional inequality dynamics in Mexico.” Spatial Economic Analysis. Forthcoming.
 mexico.csv: attribute data
 mexico.gal: spatial weights in GAL format
rook31¶
Small test shapefile
 rook31.dbf: attribute data
 rook31.gal: spatia weights in GAL format
 rook31.shp: shapefile
 rook31.shx: spatial index
sacramento2¶
1998 and 2001 Zip Code Business Patterns (Census Bureau) for Sacramento MSA
 sacramento2.dbf
 sacramento2.sbn
 sacramento2.sbx
 sacramento2.shp
 sacramento2.shx
shp_test¶
Sample Shapefiles used only for testing purposes. Each example include a ”.shp” Shapefile, ”.shx” Shapefile Index, ”.dbf” DBase file, and a ”.prj” ESRI Projection file.
Examples include:
 Point: Example of an ESRI Shapefile of Type 1 (Point).
 Line: Example of an ESRI Shapefile of Type 3 (Line).
 Polygon: Example of an ESRI Shapefile of Type 5 (Polygon).
sids2¶
North Carolina county SIDS death counts and rates
 sids2.dbf: attribute data
 sids2.html: documentation
 sids2.shp: shapefile
 sids2.shx: spatial index
 sids2.gal: GAL file for spatial weights
stl_hom¶
Homicides and selected socioeconomic characteristics for counties surrounding St Louis, MO. Data aggregated for three time periods: 197984 (steady decline in homicides), 198488 (stable period), and 198893 (steady increase in homicides). Source: S. Messner, L. Anselin, D. Hawkins, G. Deane, S. Tolnay, R. Baller (2000). An Atlas of the Spatial Patterning of CountyLevel Homicide, 19601990. Pittsburgh, PA, National Consortium on Violence Research (NCOVR).
 stl_hom.html: Metadata
 stl_hom.txt: txt file with attribute data
 stl_hom.wkt: A WellKnownText representation of the geometry.
 stl_hom.csv: attribute data and WKT geometry.
 stl.hom.gal: GAL file for spatial weights
US Regional Incomes¶
Per capita income for the lower 48 US states, 19292010
 us48.shp: shapefile
 us48.dbf: dbf for shapefile
 us48.shx: index for shapefile
 usjoin.csv: attribute data (comma delimited file)
Virginia¶
Virginia Counties Shapefile.
 virginia.shp: Shapefile
 virginia.shx: shapefile index
 virginia.dbf: attributes
 virginia.prj: shapefile projection
Next Steps with PySAL¶
The tutorials you have (hopefully) just gone through should be enough to get you going with PySAL. They covered some, but not all, of the modules in PySAL, and at that, only a selection of the functionality of particular classes that were included in the tutorials. To learn more about PySAL you should consult the documentation.
PySAL is an open source, communitybased project and we highly value contributions from individuals to the project. There are many ways to contribute, from filing bug reports, suggesting feature requests, helping with documentation, to becoming a developer. Individuals interested in joining the team should send an email to pysaldev@googlegroups.com or contact one of the developers directly.
Developer Guide¶
Go to our issues queue on GitHub NOW!
Guidelines¶
Contents
PySAL is adopting many of the conventions in the larger scientific computing in Python community and we ask that anyone interested in joining the project please review the following documents:
Open Source Development¶
PySAL is an open source project and we invite any interested user who wants to contribute to the project to contact one of the team members. For users who are new to open source development you may want to consult the following documents for background information:
Source Code¶
PySAL uses git and github for our code repository.
Please see our procedures and policies for development on GitHub as well as how to configure your local git for development.
You can setup PySAL for local development following the installation instructions.
Development Mailing List¶
Development discussions take place on pysaldev and the gitter room.
Release Schedule¶
As of version 1.11, PySAL has moved to a rolling release model. Discussions about releases are carried out during the monthly developer meetings and in the gitter room.
Governance¶
PySAL is organized around the Benevolent Dictator for Life (BDFL) model of project management. The BDFL is responsible for overall project management and direction. Developers have a critical role in shaping that direction. Specific roles and rights are as follows:
Title  Role  Rights 

BDFL  Project Director  Commit, Voting, Veto, Developer Approval/Management 
Developer  Development  Commit, Voting 
Voting and PEPs¶
During the initial phase of a release cycle, new functionality for PySAL should be described in a PySAL Enhancment Proposal (PEP). These should follow the standard format used by the Python project. For PySAL, the PEP process is as follows
 Developer prepares a plain text PEP following the guidelines
 Developer sends PEP to the BDFL
 Developer posts PEP to the PEP index
 All developers consider the PEP and vote
 PEPs receiving a majority approval become priorities for the release cycle
PySAL Testing Procedures¶
Contents
As of PySAL release 1.6, continuous integration testing was ported to the TravisCI hosted testing framework (http://travisci.org). There is integration within GitHub that provides TravisCI test results included in a pending Pull Request page, so developers can know before merging a Pull Request that the changes will or will not induce breakage.
Take a moment to read about the Pull Request development model on our wiki at https://github.com/pysal/pysal/wiki/GitHubStandardOperatingProcedures
PySAL relies on two different modes of testing [1] integration (regression) testing and [2] doctests. All developers responsible for given packages shall utilize both modes.
Integration Testing¶
Each package shall have a directory tests in which unit test scripts for each module in the package directory are required. For example, in the directory pysal/esda the module moran.py requires a unittest script named test_moran.py. This path for this script needs to be pysal/esda/tests/test_moran.py.
To ensure that any changes made to one package/module do not introduce breakage in the wider project, developers should run the package wide test suite using nose before making any commits. As of release version 1.5, all tests must pass using a 64bit version of Python. To run the new test suite, install nose, noseprogressive, and noseexclude into your working python installation. If you’re using EPD, nose is already available:
pip install U nose
pip install noseprogressive
pip install noseexclude
Then:
cd trunk/
nosetests pysal/
You can also run the test suite from within a Python session. At the conclusion of the test, Python will, however, exit:
import pysal
import nose
nose.runmodule('pysal')
The file setup.cfg (added in revision 1050) in trunk holds nose configuration variables. When nosetests is run from trunk, nose reads those configuration parameters into its operation, so developers do not need to specify the optional flags on the command line as shown below.
To specify running just a subset of the tests, you can also run:
nosetests pysal/esda/
or any other directory, for instance, to run just those tests. To run the entire unittest test suite plus all of the doctests, run:
nosetests withdoctest pysal/
To exclude a specific directory or directories, install noseexclude from PyPi (pip install noseexclude). Then run it like this:
nosetests v excludedir=pysal/contrib withdoctest pysal/
Note that you’ll probably run into an IOError complaining about too many open files. To fix that, pass this via the command line:
ulimit S n 1024
That changes the machine’s open file limit for just the current terminal session.
The trunk should most always be in a state where all tests are passed.
Generating Unit Tests¶
A useful development companion is the package pythoscope. It scans package folders and produces test script stubs for your modules that fail until you write the tests – a pesky but useful trait. Using pythoscope in the most basic way requires just two simple command line calls:
pythoscope init
pythoscope <my_module>.py
One caveat: pythoscope does not name your test classes in a PySALfriendly way so you’ll have to rename each test class after the test scripts are generated. Nose finds tests!
Docstrings and Doctests¶
All public classes and functions should include examples in their docstrings. Those examples serve two purposes:
 Documentation for users
 Tests to ensure code behavior is aligned with the documentation
Doctests will be executed when building PySAL documentation with Sphinx.
Developers should run tests manually before committing any changes that may potentially effect usability. Developers can run doctests (docstring tests) manually from the command line using nosetests
nosetests withdoctest pysal/
Tutorial Doctests¶
All of the tutorials are tested along with the overall test suite. Developers can test their changes against the tutorial docstrings by cd’ing into /doc/ and running:
make doctest
PySAL Enhancement Proposals (PEP)¶
PEP 0001 Spatial Dynamics Module¶
Author  Serge Rey <sjsrey@gmail.com>, Xinyue Ye <xinyue.ye@gmail.com> 
Status  Approved 1.0 
Created  18Jan2010 
Updated  09Feb2010 
Abstract¶
With the increasing availability of spatial longitudinal data sets there is an growing demand for exploratory methods that integrate both the spatial and temporal dimensions of the data. The spatial dynamics module combines a number of previously developed and tobedeveloped classes for the analysis of spatial dynamics. It will include classes for the following statistics for spatial dynamics, Markov, spatial Markov, rank mobility, spatial rank mobility, spacetime LISA.
Motivation¶
Rather than having each of the spatial dynamics as separate modules in PySAL, it makes sense to move them all within the same module. This would facilitate common signatures for constructors and similar forms of data structures for spacetime analysis (and generation of results).
The module would implement some of the ideas for extending LISA statistics to a dynamic context ([Anselin2000] [ReyJanikas2006]), and recent work developing empirics and summary measures for comparative space time analysis ([ReyYe2010]).
Reference Implementation¶
We suggest adding the module pysal.spatialdynamics
which in turn would
encompass the following modules:
 rank mobility rank concordance (relative mobility or internal mixing) Kendall’s index
 spatial rank mobility add a spatial dimension into rank mobility investigate the extent to which the relative mobility is spatially dependent use various types of spatial weight matrix
 Markov empirical transition probability matrix (mobility across class) Shorrock’s index
 Spatial Markov adds a spatial dimension (regional conditioning) into classic Markov models a trace statistic from a modified Markov transition matrix investigate the extent to which the interclass mobility are spatially dependent
 SpaceTime LISA extends LISA measures to integrate the time dimension combined with cg (computational geometry) module to develop comparative measurements
References¶
[Anselin2000]  Anselin, Luc (2000) Computing environments for spatial data analysis. Journal of Geographical Systems 2: 201220 
[ReyJanikas2006]  Rey, S.J. and M.V. Janikas (2006) STARS: SpaceTime Analysis of Regional Systems, Geographical Analysis 38: 6786. 
[ReyYe2010]  Rey, S.J. and X. Ye (2010) Comparative spatial dyanmics of regional systems. In Paez, A. et al. (eds) Progress in Spatial Analysis: Methods and Applications. Springer: Berlin, 441463. 
PEP 0002 Residential Segregation Module¶
Author  David C. Folch <david.folch@asu.edu> Serge Rey <srey@asu.edu> 
Status  Draft 
Created  10Feb2010 
Updated 
Abstract¶
The segregation module combines a number of previously developed and tobedeveloped measures for the analysis of residential segregation. It will include classes for twogroup and multigroup aspatial (classic) segregation indices along with their spatialized counterparts. Local segregation indices will also be included.
Motivation¶
The study of residential segregation continues to be a popular field in empirical social science and public policy development. While some of the classic measures are relatively simple to implement, the spatial versions are not nearly as straightforward for the average user. Furthermore, there does not appear to be a Python implementation of residential segregation measures currently available. There is a standalone C#.Net GUI implementation (http://www.ucs.inrs.ca/inc/Groupes/LASER/Segregation.zip) containing many of the measures to be implanted via this PEP but this is Windows only and I could not get it to run easily (it is not open source but the author sent me the code).
It has been noted that there is no onesizefitsall segregation index; however, some are clearly more popular than others. This module would bring together a wide variety of measures to allow users to easily compare the results from different indices.
Reference Implementation¶
We suggest adding the module pysal.segregation
which in turn would
encompass the following modules:
 globalSeg
 localSeg
References¶
PEP 0003 Spatial Smoothing Module¶
Author  Myunghwa Hwang <mhwang4@gmail.com> Luc Anselin <luc.anselin@asu.edu> Serge Rey <srey@asu.edu> 
Status  Approved 1.0 
Created  11Feb2010 
Updated 
Abstract¶
Spatial smoothing techniques aim to adjust problems with applying simple normalization to rate computation. Geographic studies of disease widely adopt these techniques to better summarize spatial patterns of disease occurrences. The smoothing module combines a number of previously developed and tobedeveloped classes for carrying out spatial smoothing. It will include classes for the following techniques: mean and median based smoothing, nonparametric smoothing, and empirical Bayes smoothing.
Motivation¶
Despite wide usage of spatial smoothing techniques in epidemiology, there are only few software libraries that include a range of different smoothing techniques at one place. Since spatial smoothing is a subtype of exploratory data analysis method, PySAL is the best place that host multiple smoothing techniques.
The smoothing module will mainly implement the techniques reported in [Anselin2006].
Reference Implementation¶
We suggest adding the module pysal.esda.smoothing
which in turn would
encompass the following modules:
 locally weighted averages, locally weighted median, headbanging
 spatial rate smoothing
 excess risk, empricial Bayes smoothing, spatial empirical Bayes smoothing
 headbanging
References¶
[Anselin2006] Anselin, L., N. Lozano, and J. Koschinsky (2006) Rate Transformations and Smoothing, GeoDa Center Research Report.
PEP 0004 Geographically Nested Inequality based on the Geary Statistic¶
Author  Boris Dev <boris.dev@gmail.com> Charles Schmidt <schmidtc@gmail.com> 
Status  Draft 
Created  9Aug2010 
Updated 
Abstract¶
I propose to extend the Geary statistic to describe inequality
patterns between people in the same geographic zones. Geographically nested
associations can be represented with a spatial weights matrix defined jointly
using both geographic and social positions. The key class in the proposed
geographically nested inequality module would subclass from class
pysal.esda.geary
with 2 extensions: 1) as an additional argument, an array
of regimes to represent social space; and 2) for the output, spatially nested
randomizations will be performed for pseudosignificance tests.
Motivation¶
Geographically nested measures may reveal inequality patterns that are masked
by conventional aggregate approaches. Aggregate human inequality statistics
summarize the size of the gaps in variables such as mortality rate or income
level between different different groups of people. A geographically nested
measure is computed using only a pairwise subset of the values defined by
common location in the same geographic zone. For example, this type of
measure was proposed in my dissertation to assess changes in income inequality
between nearby blocks of different school attendance zones or different racial
neighborhoods within the same cities. Since there are no standard statistical
packages to do this sort of analysis, currently such a pairwise approach to
inequality analysis across many geographic zones is tedious for researchers
who are nonhackers. Since it will take advantage of the currently existing
pysal.esda.geary
and pysal.weights.regime_weights()
, the proposed
module should be readable for hackers.
Reference Implementation¶
I suggest adding the module pysal.inequality.nested
.
References¶
[Dev2010] Dev, B. (2010) “Assessing Inequality using Geographic Income Distributions: Spatial Data Analysis of States, Neighborhoods, and School Attendance Zones” http://dl.dropbox.com/u/408103/dissertation.pdf.
PEP 0005 Space Time Event Clustering Module¶
Author  Nicholas Malizia <nmalizia@gmail.com>, Serge Rey <sjsrey@gmail.com> 
Status  Approved 1.1 
Created  13Jul2010 
Updated  06Oct2010 
Abstract¶
The spacetime event clustering module will be an addition (in the form of a submodule) to the spatial dynamics module. The purpose of this module will be to house all methods concerned with identifying clusters within spatiotemporal event data. The module will include classes for the major methods for spatiotemporal event clustering, including: the Knox, Mantel, Jacquez k Nearest Neighbors, and the SpaceTime K Function. Although these methods are tests of global spatiotemporal clustering, it is our aim to eventually extend this module to include tobedeveloped methods for local spatiotemporal clustering.
Motivation¶
While the methods of the parent module are concerned with the dynamics of aggregate latticebased data, the methods encompassed in this submodule will focus on exploring the dynamics of individual events. The methods suggested here have historically been utilized by researchers looking for clusters of events in the fields of epidemiology and criminology. Currently, the methods presented here are not widely implemented in an open source context. Although the Knox, Mantel, and Jacquez methods are available in the commercial, GUIbased software ClusterSeer, they do not appear to be implemented in an opensource context. Also, as they are implemented in ClusterSeer, the methods are not scriptable [1]. The SpaceTime K function, however, is available in an opensource context in the splancs
package for R [2]. The combination of these methods in this module would be a unique, scriptable, opensource resource for researchers interested in spatiotemporal interaction of eventbased data.
Reference Implementation¶
We suggest adding the module pysal.spatialdynamics.events
which in turn would encompass the following modules:
 Knox
 The Knox test for spacetime interaction sets critical distances in space and time; if the data are clustered, numerous pairs of events will be located within both of these critical distances and the test statistic will be large [3]. Significance will be established using a Monte Carlo method. This means that either the time stamp or location of the events is scrambled and the statistic is calculated again. This procedure is permuted to generate a distribution of statistics (for the null hypothesis of spatiotemporal randomness) which is used to establish the pseudosignificance of the observed test statistic. Options will be given to specify a range of critical distances for the space and time scales.
 Mantel
 Akin to the Knox test in its simplicity, the Mantel test keeps the distance information discarded by the Knox test. The Mantel statistic is calculated by summing the product of the distances between all the pairs of events [4]. Again, significance will be determined through a Monte Carlo approach.
 Jacquez
 This test tallies the number of event pairs that are within knearest neighbors of each other in both space and time. Significance of this count is established using a Monte Carlo permutation method [5]. Again, the permutation is done by randomizing either the time or location of the events and then running the statistic again. The test should be implemented with the additional descriptives as suggested by [6].
 SpaceTimeK
 The spacetime K function takes the K function which has been used to detect clustering in spatial point patterns and expands it to the realm of spatiotemporal data. Essentially, the method calculates K functions in space and time independently and then compares the product of these functions with a K function which takes both dimensions of space and time into account from the start [7]. Significance is established through Monte Carlo methods and the construction of confidence envelopes.
References¶
[1] 

[2] 

[3] 

[4] 

[5] 

[6] 

[7] 

PEP 0006 Kernel Density Estimation¶
Author  Serge Rey <sjsrey@gmail.com> Charles Schmidt <schmidtc@gmail.com> 
Status  Draft 
Created  11Oct2010 
Updated  11Oct2010 
Abstract¶
The kernel density estimation module will provide a uniform interface to a set
of kernel density estimation (KDE) methods. Currently KDE is used in various places
within PySAL
(e.g., Kernel
,
Kernel_Smoother
) as well as in STARS and
various projects within the GeoDA Center, but these implementations were done
separately. This module would centralize KDE within PySAL as well as extend
the suite of KDE methods and related measures available in PySAL.
Motivation¶
KDE is widely used throughout spatial analysis, from estimation of process intensity in point pattern analysis, deriving spatial weights, geographically weighted regression, rate smoothing, to hot spot detection, among others.
Reference Implementation¶
Since KDE would be used throughout existing (and likely future) modules in PySAL, it makes sense to implement it as a top level module in PySAL.
Core KDE methods that would be implemented include:
 triangular
 uniform
 quadratic
 quartic
 gaussian
Additional classes and methods to deal with KDE on restricted spaces would also be implemented.
A unified KDE api would be developed for use of the module.
Computational optimization would form a significant component of the effort for this PEP.
References¶
in progress
PEP 0007 Spatial Econometrics¶
Author  Luc Anselin <luc.anselin@asu.edu> Serge Rey <sjsrey@gmail.com>,David Folch <dfolch@asu.edu>,Daniel ArribasBel <daniel.arribas.bel@gmail.com>,Pedro Amaral <pvmda2@cam.ac.uk>,Nicholas Malizia <nmalizia@gmail.com>,Ran Wei <rwei5@asu.edu>,Jing Yao <jyao13@asu.edu>,Elizabeth Mack <Elizabeth.A.Mack@asu.edu> 
Status  Approved 1.1 
Created  12Oct2010 
Updated  12Oct2010 
Abstract¶
The spatial econometrics module will provide a uniform interface to the spatial econometric functionality contained in the former PySpace and current GeoDaSpace efforts. This module would centralize all specification, estimation, diagnostic testing and prediction/simulation for spatial econometric models.
Motivation¶
Spatial econometric methodology is at the core of GeoDa and GeoDaSpace. This module would allow access to state of the art methods at the source code level.
Reference Implementation¶
We suggest adding the module pysal.spreg
. As development
progresses, there may be a need for submodules dealing with
pure cross sectional regression, spatial panel models and spatial probit.
Core methods to be implemented include:
 OLS estimation with diagnostics for spatial effects
 2SLS estimation with diagnostics for spatial effects
 spatial 2SLS for spatial lag model (with endogeneity)
 GM and GMM estimation for spatial error model
 GMM spatial error with heteroskedasticity
 spatial HAC estimation
A significant component of the effort for this PEP would consist of implementing methods with good performance on very large data sets, exploiting sparse matrix operations in scipy.
References¶
[1] Anselin, L. (1988). Spatial Econometrics, Methods and Models. Kluwer, Dordrecht.
 [2] Anselin, L. (2006). Spatial econometrics. In Mills, T. and Patterson, K., editors,
 Palgrave Handbook of Econometrics, Volume I, Econometric Theory, pp. 901969. Palgrave Macmillan, Basingstoke.
 [3] Arraiz, I., Drukker, D., Kelejian H.H., and Prucha, I.R. (2010). A spatial CliffOrdtype
 model with heteroskedastic innovations: small and large sample results. Journal of Regional Science 50: 592614.
 [4] Kelejian, H.H. and Prucha, I.R. (1998). A generalized spatial two stage least squares
 procedure for estimationg a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and Economics 17: 99121.
 [5] Kelejian, H.H. and Prucha, I.R. (1999). A generalized moments estimator for the
 autoregressive parameter in a spatial model. International Economic Review 40: 509533.
 [6] Kelejian, H.H. and Prucha, I.R. (2007). HAC estimation in a spatial framework.
 Journal of Econometrics 140: 131154.
 [7] Kelejian, H.H. and Prucha, I.R. (2010). Specification and estimation of spatial autoregressive
 models with autoregressive and heteroskedastic disturbances. Journal of Econometrics (forthcoming).
PEP 0008 Spatial Database Module¶
Author  Phil Stephens <phil.stphns@gmail.com>, Serge Rey <sjsrey@gmail.com> 
Status  Draft 
Created  09Sep2010 
Updated  31Aug2012 
Abstract¶
A spatial database module will extend PySAL file I/O capabilities to spatial database software, allowing PySAL users to connect to and perform geographic lookups and queries on spatial databases.
Motivation¶
PySAL currently reads and writes geometry in only the Shapefile data structure. Spatiallyindexed databases permit queries on the geometric relations between objects [1].
Reference Implementation¶
We propose to add the module pysal.contrib.spatialdb
, hereafter
referred to simply as spatialdb.
spatialdb will leverage the Python Object Relational Mapper (ORM) libraries SQLAlchemy [2] and GeoAlchemy [3], MITlicensed software that
provides a databaseagnostic SQL layer for several different databases and
spatial database extensions
including PostgreSQL/PostGIS, Oracle Spatial, Spatialite, MS SQL Server, MySQL Spatial, and others.
These lightweight libraries manage database connections, transactions, and SQL expression translation.
Another option to research is the GeoDjango package. It provides a large number of spatial lookups [5] and geo queries for PostGIS databases, and a smaller set of lookups / queries for Oracle, MySQL, and SpatiaLite.
References¶
[1]  OpenGeo (2010) Spatial Database Tips and Tricks. Accessed September 9, 2010. 
[2]  SQLAlchemy (2010) SQLAlchemy 0.6.5 Documentation. Accessed October 4, 2010. 
[3]  GeoAlchemy (2010) GeoAlchemy 0.4.1 Documentation. Accessed October 4, 2010. 
[4]  GeoAlchemy (2012) GeoAlchemy on GitHub. Accessed August 9, 2012. 
[5]  GeoDjango (2012) GeoDjango Compatibility Tables. Accessed August 31, 2012. 
PEP 0009 Add Python 3.x Support¶
Author  Charles Schmidt <schmidtc@gmail.com> 
Status  Approved 1.2 
Created  02Feb2011 
Updated  02Feb2011 
Abstract¶
Python 2.x is being phased out in favor of the backwards incompatible Python 3 line. In order to stay relevant to the python community as a whole PySAL needs to support the latest production releases of Python. With the release of Numpy 1.5 and the pending release of SciPy 0.9, all PySAL dependencies support Python 3. This PEP proposes porting the code base to support both the 2.x and 3.x lines of Python.
Motivation¶
Python 2.7 is the final major release in the 2.x line. The Python 2.x line will continue to receive bug fixes, however only the 3.x line will receive new features ([Python271]). Python 3.x introduces many backward incompatible changes to Python ([PythonNewIn3]). Numpy added support for Python 3.0 in version 1.5 ([NumpyANN150]). Scipy 0.9.0 is currently in the release candidate stage and supports Python 3.0 ([SciPyRoadmap], [SciPyANN090rc2]). Many of the new features in Python 2.7 were back ported from 3.0, allowing us to start using some of the new feature of the language without abandoning our 2.x users.
Reference Implementation¶
Since python 2.6 the interpreter has included a ‘3’ command line switch to “warn about Python 3.x incompatibilities that 2to3 cannot trivially fix” ([Python2to3]). Running PySAL tests with this switch produces no warnings internal to PySAL. This suggests porting to 3.x will require only trivial changes to the code. A porting strategy is provided by [PythonNewIn3].
References¶
[Python271]  http://www.python.org/download/releases/2.7.1/ 
[PythonNewIn3]  (1, 2) http://docs.python.org/release/3.0.1/whatsnew/3.0.html 
[Python2to3]  http://docs.python.org/release/3.0.1/library/2to3.html#to3reference 
[NumpyANN150]  http://mail.scipy.org/pipermail/numpydiscussion/2010August/052522.html 
[SciPyRoadmap]  http://projects.scipy.org/scipy/roadmap#python3 
[SciPyANN090rc2]  http://mail.scipy.org/pipermail/scipydev/2011January/015927.html 
PEP 0010 Add pure Python rtree¶
Author  Serge Rey <sjsrey@gmail.com> 
Status  Approved 1.2 
Created  12Feb2011 
Updated  12Feb2011 
Abstract¶
A pure Python implementation of an Rtree will be developed for use in the construction of spatial weights matrices based on contiguity relations in shapefiles as well as supporting a spatial index that can be used by GUI based applications built with PySAL requiring brushing and linking.
Motivation¶
As of 1.1 PySAL checks if the external library ([Rtree]) is installed. If it is not, then an internal binning algorithm is used to determine contiguity relations in shapefiles for the construction of certain spatial weights. A pure Python implementation of Rtrees may provide for improved crossplatform efficiency when the external Rtree library is not present. At the same time, such an implementation can be relied on by application developers using PySAL who wish to build visualization applications supporting brushing, linking and other interactions requiring spatial indices for object selection.
PEP 0011 Move from Google Code to Github¶
Author  Serge Rey <sjsrey@gmail.com> 
Status  Draft 
Created  04Aug2012 
Updated  04Aug2012 
Abstract¶
This proposal is to move the PySAL code repository from Google Code to Github.
Motivation¶
Git is a decentralized version control system that brings a number of benefits:
 distributed development
 offline development
 elegant and lightweight branching
 fast operations
 flexible workflows
among many others.
The two main PySAL dependencies, SciPy and NumPy, made the switch to GitHub roughly two years ago. In discussions with members of those development teams and related projects (pandas, statsmodels) it is clear that git is gaining widespread adoption in the Python scientific computing community. By moving to git and GitHub, PySAL would benefit by facilitating interaction with developers in this community. Discussions with developers at SciPy 2012 indicated that all projects experienced significant growth in community involvement after the move to Github. Other projects considering such a move have been discussing similar issues.
Moving to GitHub would also streamline the administration of project updates, documentation and related tasks. The Google Code infrastructure requires updates in multiple locations which results in either additional work, or neglected changes during releases. GitHub understands markdown and reStructured text formats, the latter is heavily used in PySAL documentation and the former is clearly preferred to wiki markup on Google Code.
Although there is a learning curve to Git, it is relatively minor for developers familiar with Subversion, as all PySAL developers are. Moreover, several of the developers have been using Git and GitHub for other projects and have expressed interest in such a move. There are excellent online resources for learning more about git, such as this book.
Reference Implementation¶
Moving code and history¶
There are utilities, such as svn2git that can be used to convert an SVN repo to a git repo.
The converted git repo would then be pushed to a GitHub account.
Setting up post(commitpushpull) hooks¶
Migration of the current integration testing will be required. Github has support for PostReceive Hooks that can be used for this aspect of the migration.
Moving issues tracking over¶
A decision about whether to move the issue tracking over to Github will have to be considered. This has been handled in different ways:
 keep using Google Code for issue tracking
 move all issues (even closed ones) over to Github
 freeze tickets at Google Code and have a breadcrumb for active tickets pointing to issue tracker at Github
If we decide to move the issues over we may look at tratihubus as well as other possibilities.
Continuous integration with travisci¶
TravisCI is a hosted Continuous Integration (CI) service that is integrated with GitHub. This sponsored service provides:
 testing with multiple versions of Python
 testing with multiple versions of project dependencies (numpy and scipy)
 build history
 integrated GitHub commit hooks
 testing against multiple database services
Configuration is achieved with a single YAML file, reducing development overhead, maintenance, and monitoring.
Code Sprint for GitHub migration¶
The proposal is to organize a future sprint to focus on this migration.
PySAL Documentation¶
Contents
Writing Documentation¶
The PySAL project contains two distinct forms of documentation: inline and noninline. Inline docs are contained in the source code itself, in what are known as docstrings. Noninline documentation is in the doc folder in the trunk.
Inline documentation is processed with an extension to Sphinx called napoleon. We have adopted the community standard outlined here.
PySAL makes use of the builtin Sphinx extension viewcode, which allows the reader to quicky toggle between docs and source code. To use it, the source code module requires at least one properly formatted docstring.
Noninline documentation editors can opt to strikethrough older documentation rather than delete it with the custom “role” directive as follows. Near the top of the document, add the role directive. Then, to strike through old text, add the :strike: directive and offset the text with backticks. This strikethrough is produced like this:
.. role:: strike
...
...
This :strike:`strikethrough` is produced like this:
Compiling Documentation¶
PySAL documentation is built using Sphinx and the Sphinx extension napoleon, which formats PySAL’s docstrings.
Note¶
If you’re using Sphinx version 1.3 or newer, napoleon is included and should be called in the main conf.py as sphinx.ext.napoleon rather than installing it as we show below.
If you’re using a version of Sphinx that does not ship with napoleon ( Sphinx < 1.3), you’ll need napoleon version 0.2.4 or later and Sphinx version 1.0 or later to compile the documentation. Both modules are available at the Python Package Index, and can be downloaded and installed from the command line using pip or easy_install.:
$ easy_install sphinx
$ easy_install sphinxcontribnapoleon
or
$ pip sphinx $ pip sphinxcontribnapoleon
If you get a permission error, trying using ‘sudo’.
The source for the docs is in doc. Building the documentation is done as follows (assuming sphinx and napoleon are already installed):
$ cd doc; ls
build Makefile source
$ make clean
$ make html
To see the results in a browser open build/html/index.html. To make changes, edit (or add) the relevant files in source and rebuild the docs using the ‘make html’ (or ‘make clean’ if you’re adding new documents) command. Consult the Sphinx markup guide for details on the syntax and structure of the files in source.
Once you’re happy with your changes, checkin the source files. Do not add or checkin files under build since they are dynamically built.
Changes checked in to Github will be propogated to readthedocs within a few minutes.
Lightweight Editing with rst2html.py¶
Because the doc build process can sometimes be lengthy, you may want to avoid having to do a full build until after you are done with your major edits on one particular document. As part of the docutils package, the file rs2html.py can take an rst document and generate the html file. This will get most of the work done that you need to get a sense if your edits are good, without having to rebuild all the PySAL docs. As of version 0.8 it also understands LaTeX. It will cough on some sphinx directives, but those can be dealt with in the final build.
To use this download the doctutils tarball and put rst2html.py somewhere in your path. In vim (on Mac OS X) you can then add something like:
map ;r ^[:!rst2html.py % > ~/tmp/tmp.html; open ~/tmp/tmp.html^M^M
which will render the html in your default browser.
Things to watch out for¶
If you encounter a failing tutorial doctest that does not seem to be in error, it could be a difference in whitespace between the expected and received output. In that case, add an ‘options’ line as follows:
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> print 'a b c'
abc
Adding a new package and modules¶
To include the docstrings of a new module in the API docs the following steps are required:
 In the directory /doc/source/library add a directory with the name of the new package. You can skip to step 3 if the package exists and you are just adding new modules to this package.
 Within /doc/source/library/packageName add a file index.rst
 For each new module in this package, add a file moduleName.rst and update the index.rst file to include modulename.
Adding a new tutorial: spreg¶
While the API docs are automatically generated when compiling with Sphinx, tutorials that demonstrate use cases for new modules need to be crafted by the developer. Below we use the case of one particular module that currently does not have a tutorial as a guide for how to add tutorials for new modules.
As of PySAL 1.3 there are API docs for spreg but no tutorial currently exists for this module.
We will fix this and add a tutorial for spreg.
Requirements¶
 sphinx
 napoleon
 pysal sources
You can install sphinx or napoleon using easy_install as described above in Writing Documentation.
Where to add the tutorial content¶
Within the PySAL source the docs live in:
pysal/doc/source
This directory has the source reStructuredText files used to render the html pages. The tutorial pages live under:
pysal/doc/source/users/tutorials
As of PySAL 1.3, the content of this directory is:
autocorrelation.rst fileio.rst next.rst smoothing.rst
dynamics.rst index.rst region.rst weights.rst
examples.rst intro.rst shapely.rst
The body of the index.rst file lists the sections for the tutorials:
Introduction to the Tutorials <intro>
File Input and Output <fileio>
Spatial Weights <weights>
Spatial Autocorrelation <autocorrelation>
Spatial Smoothing <smoothing>
Regionalization <region>
Spatial Dynamics <dynamics>
Shapely Extension <shapely>
Next Steps <next>
Sample Datasets <examples>
In order to add a tutorial for spreg we need the to change this to read:
Introduction to the Tutorials <intro>
File Input and Output <fileio>
Spatial Weights <weights>
Spatial Autocorrelation <autocorrelation>
Spatial Smoothing <smoothing>
Spatial Regression <spreg>
Regionalization <region>
Spatial Dynamics <dynamics>
Shapely Extension <shapely>
Next Steps <next>
Sample Datasets <examples>
So we are adding a new section that will show up as Spatial Regression and its contents will be found in the file spreg.rst. To create the latter file simpy copy say dynamics.rst to spreg.rst and then modify spreg.rst to have the correct content.
Once this is done, move back up to the top level doc directory:
pysal/doc
Then:
$ make clean
$ make html
Point your browser to pysal/doc/build/html/index.html
and check your work. You can then make changes to the spreg.rst file and recompile until you are set with the content.
Proper Reference Formatting¶
For proper hypertext linking of reference material, each unique reference in a single python module can only be explicitly named once. Take the following example for instance:
References

.. [1] Kelejian, H.R., Prucha, I.R. (1998) "A generalized spatial
twostage least squares procedure for estimating a spatial autoregressive
model with autoregressive disturbances". The Journal of Real State
Finance and Economics, 17, 1.
It is “named” as “1”. Any other references (even the same paper) with the same “name” will cause a Duplicate Reference error when Sphinx compiles the document. Several workarounds are available but no concensus has emerged.
One possible solution is to use an anonymous reference on any subsequent duplicates, signified by a single underscore with no brackets. Another solution is to put all document references together at the bottom of the document, rather than listing them at the bottom of each class, as has been done in some modules.
PySAL Release Management¶
Contents
Prepare the release¶
Create a branch out of the dev branch
Check all tests pass. See PySAL Testing Procedures.
Update CHANGELOG:
$ python tools/github_stats.py days >> chglog
where days is the number of days to start the logs at
Prepend chglog to CHANGELOG and edit
Edit THANKS and README and README.md if needed.
Edit the file version.py to update MAJOR, MINOR, MICRO
Bump:
$ cd tools; python bump.py
Commit all changes.
Push your branch up to your GitHub repos
On github issue a pull request, with a target of upstream master. Add a comment that this is for release.
Tag¶
With version 1.10 we changed the way the tags are created to reflect our policy of not pushing directly to upstream.
After you have issued a pull request, the project maintainer can create the release on GitHub.
Make and Upload and Test Distributions¶
On each build machine, clone and checkout the newly created tag (assuming that is v1.10 in what follows):
$ git clone http://github.com/pysal/pysal.git
$ cd pysal
$ git fetch tags
$ git checkout v1.10
Make and upload to the Testing Python Package Index:
$ python setup.py sdist upload r https://testpypi.python.org/pypi
Test that your package can install correctly:
$ pip install extraindexurl https://testpypi.python.org/pypi pysal
If all is good, proceed, otherwise fix, and repeat.
Make and upload to the Python Package Index in one shot!:
$ python setup.py sdist (to test it) $ python setup.py sdist upload
 if not registered, do so. Follow the prompts. You can save the
login credentials in a dotfile, .pypirc
Make and upload the Windows installer to SourceForge.  On a Windows box, build the installer as so:
$ python setup.py bdist_wininst
Announce¶
Draft and distribute press release on geodacenter.asu.edu, openspacelist, and pysal.org
 On GeoDa center website, do this:
 Login and expand the wrench icon to reveal the Admin menu
 Click “Administer”, “Content Management”, “Content”
 Next, click “List”, filter by type, and select “Featured Project”.
 Click “Filter”
Now you will see the list of Featured Projects. Find “PySAL”.
 Choose to ‘edit’ PySAL and modify the short text there. This changes the text users see on the homepage slider.
 Clicking on the name “PySAL” allows you to edit the content of the PySAL project page, which is also the “About PySAL” page linked to from the homepage slider.
Put master back to dev¶
 Change MAJOR, MINOR version in setup script.
 Change pysal/version.py to dev number
 Change the docs version from X.x to X.xdev by editing doc/source/conf.py in two places.
 Update the release schedule in doc/source/developers/guidelines.rst
Update the github.io news page to announce the release.
PySAL and Python3¶
Contents
Starting with version 1.11, PySAL supports both the Python 2.x series (version 2.6 and 2.7) as well as Python 3.4 and newer.
Projects Using PySAL¶
This page lists other software projects making use of PySAL. If your project is not listed here, contact one of the team members and we’ll add it.
GeoDa Center Projects¶
Known Issues¶
1.5 install fails with scipy 11.0 on Mac OS X¶
Running python setup.py install results in:
from _cephes import *
ImportError:
dlopen(/Users/serge/Documents/p/pysal/virtualenvs/python1.5/lib/python2.7/sitepackages/scipy/special/_cephes.so,
2): Symbol not found: _aswfa_
Referenced from:
/Users/serge/Documents/p/pysal/virtualenvs/python1.5/lib/python2.7/sitepackages/scipy/special/_cephes.so
Expected in: dynamic lookup
This occurs when your scipy on Mac OS X was complied with gnu95 and not gfortran. See this thread for possible solutions.
weights.DistanceBand failing¶
This occurs due to a bug in scipy.sparse prior to version 0.8. If you are running such a version see Issue 73 for a fix.
doc tests and unit tests under Linux¶
Some Linux machines return different results for the unit and doc tests. We suspect this has to do with the way random seeds are set. See Issue 52
Library Reference¶
Release:  1.13.0 

Date:  Feb 12, 2017 
Python Spatial Analysis Library¶
The Python Spatial Analysis Library consists of several subpackages each addressing a different area of spatial analysis. In addition to these subpackages PySAL includes some general utilities used across all modules.
Subpackages¶
pysal.cg
– Computational Geometry¶
cg.locators
— Locators¶
The cg.locators
module provides ....
New in version 1.0.
Computational geometry code for PySAL: Python Spatial Analysis Library.

class
pysal.cg.locators.
IntervalTree
(intervals)[source]¶ Representation of an interval tree. An interval tree is a data structure which is used to quickly determine which intervals in a set contain a value or overlap with a query interval. [DeBerg2008]

query
(q)[source]¶ Returns the intervals intersected by a value or interval.
query((number, number) or number) > x list
Parameters: q (a value or interval to find intervals intersecting) – Examples
>>> intervals = [(1, 2, 'A'), (5, 9, 'B'), (3, 6, 'C')] >>> it = IntervalTree(intervals) >>> it.query((7, 14)) ['B'] >>> it.query(1) ['A']


class
pysal.cg.locators.
Grid
(bounds, resolution)[source]¶ Representation of a binning data structure.

add
(item, pt)[source]¶ Adds an item to the grid at a specified location.
add(x, Point) > x
Parameters:  item (the item to insert into the grid) –
 pt (the location to insert the item at) –
Examples
>>> g = Grid(Rectangle(0, 0, 10, 10), 1) >>> g.add('A', Point((4.2, 8.7))) 'A'

bounds
(bounds)[source]¶ Returns a list of items found in the grid within the bounds specified.
bounds(Rectangle) > x list
Parameters:  item (the item to remove from the grid) –
 pt (the location the item was added at) –
Examples
>>> g = Grid(Rectangle(0, 0, 10, 10), 1) >>> g.add('A', Point((1.0, 1.0))) 'A' >>> g.add('B', Point((4.0, 4.0))) 'B' >>> g.bounds(Rectangle(0, 0, 3, 3)) ['A'] >>> g.bounds(Rectangle(2, 2, 5, 5)) ['B'] >>> sorted(g.bounds(Rectangle(0, 0, 5, 5))) ['A', 'B']

in_grid
(loc)[source]¶ Returns whether a 2tuple location _loc_ lies inside the grid bounds.
Test tag: <tc>#is#Grid.in_grid</tc>

nearest
(pt)[source]¶ Returns the nearest item to a point.
nearest(Point) > x
Parameters: pt (the location to search near) – Examples
>>> g = Grid(Rectangle(0, 0, 10, 10), 1) >>> g.add('A', Point((1.0, 1.0))) 'A' >>> g.add('B', Point((4.0, 4.0))) 'B' >>> g.nearest(Point((2.0, 1.0))) 'A' >>> g.nearest(Point((7.0, 5.0))) 'B'

proximity
(pt, r)[source]¶ Returns a list of items found in the grid within a specified distance of a point.
proximity(Point, number) > x list
Parameters:  pt (the location to search around) –
 r (the distance to search around the point) –
Examples
>>> g = Grid(Rectangle(0, 0, 10, 10), 1) >>> g.add('A', Point((1.0, 1.0))) 'A' >>> g.add('B', Point((4.0, 4.0))) 'B' >>> g.proximity(Point((2.0, 1.0)), 2) ['A'] >>> g.proximity(Point((6.0, 5.0)), 3.0) ['B'] >>> sorted(g.proximity(Point((4.0, 1.0)), 4.0)) ['A', 'B']

remove
(item, pt)[source]¶ Removes an item from the grid at a specified location.
remove(x, Point) > x
Parameters:  item (the item to remove from the grid) –
 pt (the location the item was added at) –
Examples
>>> g = Grid(Rectangle(0, 0, 10, 10), 1) >>> g.add('A', Point((4.2, 8.7))) 'A' >>> g.remove('A', Point((4.2, 8.7))) 'A'


class
pysal.cg.locators.
BruteForcePointLocator
(points)[source]¶ A class which does naive linear search on a set of Point objects.

nearest
(query_point)[source]¶ Returns the nearest point indexed to a query point.
nearest(Point) > Point
Parameters: query_point (a point to find the nearest indexed point to) – Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = BruteForcePointLocator(points) >>> n = pl.nearest(Point((1, 1))) >>> str(n) '(0.0, 0.0)'

proximity
(origin, r)[source]¶ Returns the indexed points located within some distance of an origin point.
proximity(Point, number) > Point list
Parameters:  origin (the point to find indexed points near) –
 r (the maximum distance to find indexed point from the origin point) –
Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = BruteForcePointLocator(points) >>> neighs = pl.proximity(Point((1, 0)), 2) >>> len(neighs) 1 >>> p = neighs[0] >>> isinstance(p, Point) True >>> str(p) '(0.0, 0.0)'

region
(region_rect)[source]¶ Returns the indexed points located inside a rectangular query region.
region(Rectangle) > Point list
Parameters: region_rect (the rectangular range to find indexed points in) – Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = BruteForcePointLocator(points) >>> pts = pl.region(Rectangle(1, 1, 10, 10)) >>> len(pts) 3


class
pysal.cg.locators.
PointLocator
(points)[source]¶ An abstract representation of a point indexing data structure.

nearest
(query_point)[source]¶ Returns the nearest point indexed to a query point.
nearest(Point) > Point
Parameters: query_point (a point to find the nearest indexed point to) – Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = PointLocator(points) >>> n = pl.nearest(Point((1, 1))) >>> str(n) '(0.0, 0.0)'

overlapping
(region_rect)¶ Returns the indexed points located inside a rectangular query region.
region(Rectangle) > Point list
Parameters: region_rect (the rectangular range to find indexed points in) – Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = PointLocator(points) >>> pts = pl.region(Rectangle(1, 1, 10, 10)) >>> len(pts) 3

proximity
(origin, r)[source]¶ Returns the indexed points located within some distance of an origin point.
proximity(Point, number) > Point list
Parameters:  origin (the point to find indexed points near) –
 r (the maximum distance to find indexed point from the origin point) –
Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = PointLocator(points) >>> len(pl.proximity(Point((1, 0)), 2)) 1

region
(region_rect)[source]¶ Returns the indexed points located inside a rectangular query region.
region(Rectangle) > Point list
Parameters: region_rect (the rectangular range to find indexed points in) – Examples
>>> points = [Point((0, 0)), Point((1, 6)), Point((5.4, 1.4))] >>> pl = PointLocator(points) >>> pts = pl.region(Rectangle(1, 1, 10, 10)) >>> len(pts) 3


class
pysal.cg.locators.
PolygonLocator
(polygons)[source]¶ An abstract representation of a polygon indexing data structure.

contains_point
(point)[source]¶ Returns polygons that contain point
Parameters: point (point (x,y)) – Returns: Return type: list of polygons containing point Examples
>>> p1 = Polygon([Point((0,0)), Point((6,0)), Point((4,4))]) >>> p2 = Polygon([Point((1,2)), Point((4,0)), Point((4,4))]) >>> p1.contains_point((2,2)) 1 >>> p2.contains_point((2,2)) 1 >>> pl = PolygonLocator([p1, p2]) >>> len(pl.contains_point((2,2))) 2 >>> p2.contains_point((1,1)) 0 >>> p1.contains_point((1,1)) 1 >>> len(pl.contains_point((1,1))) 1 >>> p1.centroid (3.3333333333333335, 1.3333333333333333) >>> pl.contains_point((1,1))[0].centroid (3.3333333333333335, 1.3333333333333333)

inside
(query_rectangle)[source]¶ Returns polygons that are inside query_rectangle
Examples
>>> p1 = Polygon([Point((0, 1)), Point((4, 5)), Point((5, 1))]) >>> p2 = Polygon([Point((3, 9)), Point((6, 7)), Point((1, 1))]) >>> p3 = Polygon([Point((7, 1)), Point((8, 7)), Point((9, 1))]) >>> pl = PolygonLocator([p1, p2, p3]) >>> qr = Rectangle(0, 0, 5, 5) >>> res = pl.inside( qr ) >>> len(res) 1 >>> qr = Rectangle(3, 7, 5, 8) >>> res = pl.inside( qr ) >>> len(res) 0 >>> qr = Rectangle(10, 10, 12, 12) >>> res = pl.inside( qr ) >>> len(res) 0 >>> qr = Rectangle(0, 0, 12, 12) >>> res = pl.inside( qr ) >>> len(res) 3
Notes
inside means the intersection of the query rectangle and a polygon is not empty and is equal to the area of the polygon

nearest
(query_point, rule='vertex')[source]¶ Returns the nearest polygon indexed to a query point based on various rules.
nearest(Polygon) > Polygon
Parameters:  query_point (a point to find the nearest indexed polygon to) –
 rule (representative point for polygon in nearest query.) – vertex – measures distance between vertices and query_point centroid – measures distance between centroid and query_point edge – measures the distance between edges and query_point
Examples
>>> p1 = Polygon([Point((0, 1)), Point((4, 5)), Point((5, 1))]) >>> p2 = Polygon([Point((3, 9)), Point((6, 7)), Point((1, 1))]) >>> pl = PolygonLocator([p1, p2]) >>> try: n = pl.nearest(Point((1, 1))) ... except NotImplementedError: print "future test: str(min(n.vertices())) == (0.0, 1.0)" future test: str(min(n.vertices())) == (0.0, 1.0)

overlapping
(query_rectangle)[source]¶ Returns list of polygons that overlap query_rectangle
Examples
>>> p1 = Polygon([Point((0, 1)), Point((4, 5)), Point((5, 1))]) >>> p2 = Polygon([Point((3, 9)), Point((6, 7)), Point((1, 1))]) >>> p3 = Polygon([Point((7, 1)), Point((8, 7)), Point((9, 1))]) >>> pl = PolygonLocator([p1, p2, p3]) >>> qr = Rectangle(0, 0, 5, 5) >>> res = pl.overlapping( qr ) >>> len(res) 2 >>> qr = Rectangle(3, 7, 5, 8) >>> res = pl.overlapping( qr ) >>> len(res) 1 >>> qr = Rectangle(10, 10, 12, 12) >>> res = pl.overlapping( qr ) >>> len(res) 0 >>> qr = Rectangle(0, 0, 12, 12) >>> res = pl.overlapping( qr ) >>> len(res) 3 >>> qr = Rectangle(8, 3, 9, 4) >>> p1 = Polygon([Point((2, 1)), Point((2, 3)), Point((4, 3)), Point((4,1))]) >>> p2 = Polygon([Point((7, 1)), Point((7, 5)), Point((10, 5)), Point((10, 1))]) >>> pl = PolygonLocator([p1, p2]) >>> res = pl.overlapping(qr) >>> len(res) 1
Notes
overlapping means the intersection of the query rectangle and a polygon is not empty and is no larger than the area of the polygon

proximity
(origin, r, rule='vertex')[source]¶ Returns the indexed polygons located within some distance of an origin point based on various rules.
proximity(Polygon, number) > Polygon list
Parameters:  origin (the point to find indexed polygons near) –
 r (the maximum distance to find indexed polygon from the origin point) –
 rule (representative point for polygon in nearest query.) – vertex – measures distance between vertices and query_point centroid – measures distance between centroid and query_point edge – measures the distance between edges and query_point
Examples
>>> p1 = Polygon([Point((0, 1)), Point((4, 5)), Point((5, 1))]) >>> p2 = Polygon([Point((3, 9)), Point((6, 7)), Point((1, 1))]) >>> pl = PolygonLocator([p1, p2]) >>> try: ... len(pl.proximity(Point((0, 0)), 2)) ... except NotImplementedError: ... print "future test: len(pl.proximity(Point((0, 0)), 2)) == 2" future test: len(pl.proximity(Point((0, 0)), 2)) == 2

region
(region_rect)[source]¶ Returns the indexed polygons located inside a rectangular query region.
region(Rectangle) > Polygon list
Parameters: region_rect (the rectangular range to find indexed polygons in) – Examples
>>> p1 = Polygon([Point((0, 1)), Point((4, 5)), Point((5, 1))]) >>> p2 = Polygon([Point((3, 9)), Point((6, 7)), Point((1, 1))]) >>> pl = PolygonLocator([p1, p2]) >>> n = pl.region(Rectangle(0, 0, 4, 10)) >>> len(n) 2

cg.shapes
— Shapes¶
The cg.shapes
module provides basic data structures.
New in version 1.0.
Computational geometry code for PySAL: Python Spatial Analysis Library.

class
pysal.cg.shapes.
LineSegment
(start_pt, end_pt)[source]¶ Geometric representation of line segment objects.
Parameters: 
p1
¶ Point – Starting point

p2
¶ Point – Ending point

bounding_box
¶ tuple – The bounding box of the segment (number 4tuple)

len
¶ float – The length of the segment

line
¶ Line – The line on which the segment lies

bounding_box
Returns the minimum bounding box of a LineSegment object.
Test tag: <tc>#is#LineSegment.bounding_box</tc> Test tag: <tc>#tests#LineSegment.bounding_box</tc>
bounding_box > Rectangle
Examples
>>> ls = LineSegment(Point((1, 2)), Point((5, 6))) >>> ls.bounding_box.left 1.0 >>> ls.bounding_box.lower 2.0 >>> ls.bounding_box.right 5.0 >>> ls.bounding_box.upper 6.0

get_swap
()[source]¶ Returns a LineSegment object which has its endpoints swapped.
get_swap() > LineSegment
Test tag: <tc>#is#LineSegment.get_swap</tc> Test tag: <tc>#tests#LineSegment.get_swap</tc>
Examples
>>> ls = LineSegment(Point((1, 2)), Point((5, 6))) >>> swap = ls.get_swap() >>> swap.p1[0] 5.0 >>> swap.p1[1] 6.0 >>> swap.p2[0] 1.0 >>> swap.p2[1] 2.0

intersect
(other)[source]¶ Test whether segment intersects with other segment
Handles endpoints of segments being on other segment
Examples
>>> ls = LineSegment(Point((5,0)), Point((10,0))) >>> ls1 = LineSegment(Point((5,0)), Point((10,1))) >>> ls.intersect(ls1) True >>> ls2 = LineSegment(Point((5,1)), Point((10,1))) >>> ls.intersect(ls2) False >>> ls2 = LineSegment(Point((7,1)), Point((7,2))) >>> ls.intersect(ls2) True >>>

is_ccw
(pt)[source]¶ Returns whether a point is counterclockwise of the segment. Exclusive.
is_ccw(Point) > bool
Test tag: <tc>#is#LineSegment.is_ccw</tc> Test tag: <tc>#tests#LineSegment.is_ccw</tc>
Parameters: pt (point lying ccw or cw of a segment) – Examples
>>> ls = LineSegment(Point((0, 0)), Point((5, 0))) >>> ls.is_ccw(Point((2, 2))) True >>> ls.is_ccw(Point((2, 2))) False

is_cw
(pt)[source]¶ Returns whether a point is clockwise of the segment. Exclusive.
is_cw(Point) > bool
Test tag: <tc>#is#LineSegment.is_cw</tc> Test tag: <tc>#tests#LineSegment.is_cw</tc>
Parameters: pt (point lying ccw or cw of a segment) – Examples
>>> ls = LineSegment(Point((0, 0)), Point((5, 0))) >>> ls.is_cw(Point((2, 2))) False >>> ls.is_cw(Point((2, 2))) True

len
Returns the length of a LineSegment object.
Test tag: <tc>#is#LineSegment.len</tc> Test tag: <tc>#tests#LineSegment.len</tc>
len() > number
Examples
>>> ls = LineSegment(Point((2, 2)), Point((5, 2))) >>> ls.len 3.0

line
Returns a Line object of the line which the segment lies on.
Test tag: <tc>#is#LineSegment.line</tc> Test tag: <tc>#tests#LineSegment.line</tc>
line() > Line
Examples
>>> ls = LineSegment(Point((2, 2)), Point((3, 3))) >>> l = ls.line >>> l.m 1.0 >>> l.b 0.0

p1
HELPER METHOD. DO NOT CALL.
Returns the p1 attribute of the line segment.
_get_p1() > Point
Examples
>>> ls = LineSegment(Point((1, 2)), Point((5, 6))) >>> r = ls._get_p1() >>> r == Point((1, 2)) True

p2
HELPER METHOD. DO NOT CALL.
Returns the p2 attribute of the line segment.
_get_p2() > Point
Examples
>>> ls = LineSegment(Point((1, 2)), Point((5, 6))) >>> r = ls._get_p2() >>> r == Point((5, 6)) True

sw_ccw
(pt)[source]¶ Sedgewick test for pt being ccw of segment
Returns:  1 if turn from self.p1 to self.p2 to pt is ccw
 1 if turn from self.p1 to self.p2 to pt is cw
 1 if the points are collinear and self.p1 is in the middle
 1 if the points are collinear and self.p2 is in the middle
 0 if the points are collinear and pt is in the middle


class
pysal.cg.shapes.
Line
(m, b)[source]¶ Geometric representation of line objects.

m
¶ float – slope

b
¶ float – yintercept


class
pysal.cg.shapes.
Ray
(origin, second_p)[source]¶ Geometric representation of ray objects.

o
¶ Point – Origin (point where ray originates)

p
¶ Point – Second point on the ray (not point where ray originates)


class
pysal.cg.shapes.
Chain
(vertices)[source]¶ Geometric representation of a chain, also known as a polyline.

vertices
¶ list – List of Points of the vertices of the chain in order.

len
¶ float – The geometric length of the chain.

arclen
¶ Returns the geometric length of the chain computed using arcdistance (meters).
len > number
Examples

bounding_box
¶ Returns the bounding box of the chain.
bounding_box > Rectangle
Examples
>>> c = Chain([Point((0, 0)), Point((2, 0)), Point((2, 1)), Point((0, 1))]) >>> c.bounding_box.left 0.0 >>> c.bounding_box.lower 0.0 >>> c.bounding_box.right 2.0 >>> c.bounding_box.upper 1.0

len
Returns the geometric length of the chain.
len > number
Examples
>>> c = Chain([Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((2, 1))]) >>> c.len 3.0 >>> c = Chain([[Point((0, 0)), Point((1, 0)), Point((1, 1))],[Point((10,10)),Point((11,10)),Point((11,11))]]) >>> c.len 4.0

parts
¶ Returns the parts of the chain.
parts > Point list
Examples
>>> c = Chain([[Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((0, 1))],[Point((2,1)),Point((2,2)),Point((1,2)),Point((1,1))]]) >>> len(c.parts) 2

segments
¶ Returns the segments that compose the Chain

vertices
Returns the vertices of the chain in clockwise order.
vertices > Point list
Examples
>>> c = Chain([Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((2, 1))]) >>> verts = c.vertices >>> len(verts) 4


class
pysal.cg.shapes.
Polygon
(vertices, holes=None)[source]¶ Geometric representation of polygon objects.

vertices
¶ list – List of Points with the vertices of the Polygon in clockwise order

len
¶ int – Number of vertices including holes

perimeter
¶ float – Geometric length of the perimeter of the Polygon

bounding_box
¶ Rectangle – Bounding box of the polygon

bbox
¶ List – [left, lower, right, upper]

area
¶ float – Area enclosed by the polygon

centroid
¶ tuple – The ‘center of gravity’, i.e. the mean point of the polygon.

area
Returns the area of the polygon.
area > number
Examples
>>> p = Polygon([Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((0, 1))]) >>> p.area 1.0 >>> p = Polygon([Point((0, 0)), Point((10, 0)), Point((10, 10)), Point((0, 10))],[Point((2,1)),Point((2,2)),Point((1,2)),Point((1,1))]) >>> p.area 99.0

bbox
Returns the bounding box of the polygon as a list
See also bounding_box

bounding_box
Returns the bounding box of the polygon.
bounding_box > Rectangle
Examples
>>> p = Polygon([Point((0, 0)), Point((2, 0)), Point((2, 1)), Point((0, 1))]) >>> p.bounding_box.left 0.0 >>> p.bounding_box.lower 0.0 >>> p.bounding_box.right 2.0 >>> p.bounding_box.upper 1.0

centroid
Returns the centroid of the polygon
centroid > Point
Notes
The centroid returned by this method is the geometric centroid and respects multipart polygons with holes. Also known as the ‘center of gravity’ or ‘center of mass’.
Examples
>>> p = Polygon([Point((0, 0)), Point((10, 0)), Point((10, 10)), Point((0, 10))], [Point((1, 1)), Point((1, 2)), Point((2, 2)), Point((2, 1))]) >>> p.centroid (5.0353535353535355, 5.0353535353535355)

contains_point
(point)[source]¶ Test if polygon contains point
Examples
>>> p = Polygon([Point((0,0)), Point((4,0)), Point((4,5)), Point((2,3)), Point((0,5))]) >>> p.contains_point((3,3)) 1 >>> p.contains_point((0,6)) 0 >>> p.contains_point((2,2.9)) 1 >>> p.contains_point((4,5)) 0 >>> p.contains_point((4,0)) 0 >>>
Handles holes
>>> p = Polygon([Point((0, 0)), Point((0, 10)), Point((10, 10)), Point((10, 0))], [Point((2, 2)), Point((4, 2)), Point((4, 4)), Point((2, 4))]) >>> p.contains_point((3.0,3.0)) False >>> p.contains_point((1.0,1.0)) True >>>
Notes
Points falling exactly on polygon edges may yield unpredictable results

holes
¶ Returns the holes of the polygon in clockwise order.
holes > Point list
Examples
>>> p = Polygon([Point((0, 0)), Point((10, 0)), Point((10, 10)), Point((0, 10))], [Point((1, 2)), Point((2, 2)), Point((2, 1)), Point((1, 1))]) >>> len(p.holes) 1

len
Returns the number of vertices in the polygon.
len > int
Examples
>>> p1 = Polygon([Point((0, 0)), Point((0, 1)), Point((1, 1)), Point((1, 0))]) >>> p1.len 4 >>> len(p1) 4

parts
¶ Returns the parts of the polygon in clockwise order.
parts > Point list
Examples
>>> p = Polygon([[Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((0, 1))], [Point((2,1)),Point((2,2)),Point((1,2)),Point((1,1))]]) >>> len(p.parts) 2

perimeter
Returns the perimeter of the polygon.
perimeter() > number
Examples
>>> p = Polygon([Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((0, 1))]) >>> p.perimeter 4.0

vertices
Returns the vertices of the polygon in clockwise order.
vertices > Point list
Examples
>>> p1 = Polygon([Point((0, 0)), Point((0, 1)), Point((1, 1)), Point((1, 0))]) >>> len(p1.vertices) 4


class
pysal.cg.shapes.
Rectangle
(left, lower, right, upper)[source]¶ Geometric representation of rectangle objects.

left
¶ float – Minimum xvalue of the rectangle

lower
¶ float – Minimum yvalue of the rectangle

right
¶ float – Maximum xvalue of the rectangle

upper
¶ float – Maximum yvalue of the rectangle

area
¶ Returns the area of the Rectangle.
area > number
Examples
>>> r = Rectangle(0, 0, 4, 4) >>> r.area 16.0

height
¶ Returns the height of the Rectangle.
height > number
Examples
>>> r = Rectangle(0, 0, 4, 4) >>> r.height 4.0

set_centroid
(new_center)[source]¶ Moves the rectangle center to a new specified point.
set_centroid(Point) > Point
Parameters: new_center (the new location of the centroid of the polygon) – Examples
>>> r = Rectangle(0, 0, 4, 4) >>> r.set_centroid(Point((4, 4))) >>> r.left 2.0 >>> r.right 6.0 >>> r.lower 2.0 >>> r.upper 6.0

set_scale
(scale)[source]¶ Rescales the rectangle around its center.
set_scale(number) > number
Parameters: scale (the ratio of the new scale to the old scale (e.g. 1.0 is current size)) – Examples
>>> r = Rectangle(0, 0, 4, 4) >>> r.set_scale(2) >>> r.left 2.0 >>> r.right 6.0 >>> r.lower 2.0 >>> r.upper 6.0

width
¶ Returns the width of the Rectangle.
width > number
Examples
>>> r = Rectangle(0, 0, 4, 4) >>> r.width 4.0

cg.standalone
— Standalone¶
The cg.standalone
module provides ...
New in version 1.0.
Helper functions for computational geometry in PySAL

pysal.cg.standalone.
bbcommon
(bb, bbother)[source]¶ Old Stars method for bounding box overlap testing Also defined in pysal.weights._cont_binning
Examples
>>> b0 = [0,0,10,10] >>> b1 = [10,0,20,10] >>> bbcommon(b0,b1) 1

pysal.cg.standalone.
get_bounding_box
(items)[source]¶ Examples
>>> bb = get_bounding_box([Point((1, 5)), Rectangle(0, 6, 11, 12)]) >>> bb.left 1.0 >>> bb.lower 5.0 >>> bb.right 11.0 >>> bb.upper 12.0

pysal.cg.standalone.
get_angle_between
(ray1, ray2)[source]¶ Returns the angle formed between a pair of rays which share an origin get_angle_between(Ray, Ray) > number
Parameters:  ray1 (a ray forming the beginning of the angle measured) –
 ray2 (a ray forming the end of the angle measured) –
Examples
>>> get_angle_between(Ray(Point((0, 0)), Point((1, 0))), Ray(Point((0, 0)), Point((1, 0)))) 0.0

pysal.cg.standalone.
is_collinear
(p1, p2, p3)[source]¶ Returns whether a triplet of points is collinear.
is_collinear(Point, Point, Point) > bool
Parameters: Examples
>>> is_collinear(Point((0, 0)), Point((1, 1)), Point((5, 5))) True >>> is_collinear(Point((0, 0)), Point((1, 1)), Point((5, 0))) False

pysal.cg.standalone.
get_segments_intersect
(seg1, seg2)[source]¶ Returns the intersection of two segments.
get_segments_intersect(LineSegment, LineSegment) > Point or LineSegment
Parameters:  seg1 (a segment to check intersection for) –
 seg2 (a segment to check intersection for) –
Examples
>>> seg1 = LineSegment(Point((0, 0)), Point((0, 10))) >>> seg2 = LineSegment(Point((5, 5)), Point((5, 5))) >>> i = get_segments_intersect(seg1, seg2) >>> isinstance(i, Point) True >>> str(i) '(0.0, 5.0)' >>> seg3 = LineSegment(Point((100, 100)), Point((100, 101))) >>> i = get_segments_intersect(seg2, seg3)

pysal.cg.standalone.
get_segment_point_intersect
(seg, pt)[source]¶ Returns the intersection of a segment and point.
get_segment_point_intersect(LineSegment, Point) > Point
Parameters:  seg (a segment to check intersection for) –
 pt (a point to check intersection for) –
Examples
>>> seg = LineSegment(Point((0, 0)), Point((0, 10))) >>> pt = Point((0, 5)) >>> i = get_segment_point_intersect(seg, pt) >>> str(i) '(0.0, 5.0)' >>> pt2 = Point((5, 5)) >>> get_segment_point_intersect(seg, pt2)

pysal.cg.standalone.
get_polygon_point_intersect
(poly, pt)[source]¶ Returns the intersection of a polygon and point.
get_polygon_point_intersect(Polygon, Point) > Point
Parameters:  poly (a polygon to check intersection for) –
 pt (a point to check intersection for) –
Examples
>>> poly = Polygon([Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((0, 1))]) >>> pt = Point((0.5, 0.5)) >>> i = get_polygon_point_intersect(poly, pt) >>> str(i) '(0.5, 0.5)' >>> pt2 = Point((2, 2)) >>> get_polygon_point_intersect(poly, pt2)

pysal.cg.standalone.
get_rectangle_point_intersect
(rect, pt)[source]¶ Returns the intersection of a rectangle and point.
get_rectangle_point_intersect(Rectangle, Point) > Point
Parameters:  rect (a rectangle to check intersection for) –
 pt (a point to check intersection for) –
Examples
>>> rect = Rectangle(0, 0, 5, 5) >>> pt = Point((1, 1)) >>> i = get_rectangle_point_intersect(rect, pt) >>> str(i) '(1.0, 1.0)' >>> pt2 = Point((10, 10)) >>> get_rectangle_point_intersect(rect, pt2)

pysal.cg.standalone.
get_ray_segment_intersect
(ray, seg)[source]¶ Returns the intersection of a ray and line segment.
get_ray_segment_intersect(Ray, Point) > Point or LineSegment
Parameters:  ray (a ray to check intersection for) –
 seg (a line segment to check intersection for) –
Examples
>>> ray = Ray(Point((0, 0)), Point((0, 1))) >>> seg = LineSegment(Point((1, 10)), Point((1, 10))) >>> i = get_ray_segment_intersect(ray, seg) >>> isinstance(i, Point) True >>> str(i) '(0.0, 10.0)' >>> seg2 = LineSegment(Point((10, 10)), Point((10, 11))) >>> get_ray_segment_intersect(ray, seg2)

pysal.cg.standalone.
get_rectangle_rectangle_intersection
(r0, r1, checkOverlap=True)[source]¶ Returns the intersection between two rectangles.
 Note: Algorithm assumes the rectangles overlap.
 checkOverlap=False should be used with extreme caution.
get_rectangle_rectangle_intersection(r0, r1) > Rectangle, Segment, Point or None
Parameters:  r0 (a Rectangle) –
 r1 (a Rectangle) –
Examples
>>> r0 = Rectangle(0,4,6,9) >>> r1 = Rectangle(4,0,9,7) >>> ri = get_rectangle_rectangle_intersection(r0,r1) >>> ri[:] [4.0, 4.0, 6.0, 7.0] >>> r0 = Rectangle(0,0,4,4) >>> r1 = Rectangle(2,1,6,3) >>> ri = get_rectangle_rectangle_intersection(r0,r1) >>> ri[:] [2.0, 1.0, 4.0, 3.0] >>> r0 = Rectangle(0,0,4,4) >>> r1 = Rectangle(2,1,3,2) >>> ri = get_rectangle_rectangle_intersection(r0,r1) >>> ri[:] == r1[:] True

pysal.cg.standalone.
get_polygon_point_dist
(poly, pt)[source]¶ Returns the distance between a polygon and point.
get_polygon_point_dist(Polygon, Point) > number
Parameters:  poly (a polygon to compute distance from) –
 pt (a point to compute distance from) –
Examples
>>> poly = Polygon([Point((0, 0)), Point((1, 0)), Point((1, 1)), Point((0, 1))]) >>> pt = Point((2, 0.5)) >>> get_polygon_point_dist(poly, pt) 1.0 >>> pt2 = Point((0.5, 0.5)) >>> get_polygon_point_dist(poly, pt2) 0.0

pysal.cg.standalone.
get_points_dist
(pt1, pt2)[source]¶ Returns the distance between a pair of points.
get_points_dist(Point, Point) > number
Parameters:  pt1 (a point) –
 pt2 (the other point) –
Examples
>>> get_points_dist(Point((4, 4)), Point((4, 8))) 4.0 >>> get_points_dist(Point((0, 0)), Point((0, 0))) 0.0

pysal.cg.standalone.
get_segment_point_dist
(seg, pt)[source]¶ Returns the distance between a line segment and point and distance along the segment of the closest point on the segment to the point as a ratio of the length of the segment.
get_segment_point_dist(LineSegment, Point) > (number, number)
Parameters:  seg (a line segment to compute distance from) –
 pt (a point to compute distance from) –
Examples
>>> seg = LineSegment(Point((0, 0)), Point((10, 0))) >>> pt = Point((5, 5)) >>> get_segment_point_dist(seg, pt) (5.0, 0.5) >>> pt2 = Point((0, 0)) >>> get_segment_point_dist(seg, pt2) (0.0, 0.0)

pysal.cg.standalone.
get_point_at_angle_and_dist
(ray, angle, dist)[source]¶ Returns the point at a distance and angle relative to the origin of a ray.
get_point_at_angle_and_dist(Ray, number, number) > Point
Parameters:  ray (the ray which the angle and distance are relative to) –
 angle (the angle relative to the ray at which the point is located) –
 dist (the distance from the ray origin at which the point is located) –
Examples
>>> ray = Ray(Point((0, 0)), Point((1, 0))) >>> pt = get_point_at_angle_and_dist(ray, math.pi, 1.0) >>> isinstance(pt, Point) True >>> round(pt[0], 8) 1.0 >>> round(pt[1], 8) 0.0

pysal.cg.standalone.
convex_hull
(points)[source]¶ Returns the convex hull of a set of points.
convex_hull(Point list) > Polygon
Parameters: points (a list of points to compute the convex hull for) – Examples
>>> points = [Point((0, 0)), Point((4, 4)), Point((4, 0)), Point((3, 1))] >>> convex_hull(points) [(0.0, 0.0), (4.0, 0.0), (4.0, 4.0)]

pysal.cg.standalone.
is_clockwise
(vertices)[source]¶ Returns whether a list of points describing a polygon are clockwise or counterclockwise.
is_clockwise(Point list) > bool
Parameters: vertices (a list of points that form a single ring) – Examples
>>> is_clockwise([Point((0, 0)), Point((10, 0)), Point((0, 10))]) False >>> is_clockwise([Point((0, 0)), Point((0, 10)), Point((10, 0))]) True >>> v = [(106.57798, 35.174143999999998), (106.583412, 35.174141999999996), (106.58417999999999, 35.174143000000001), (106.58377999999999, 35.175542999999998), (106.58287999999999, 35.180543), (106.58263099999999, 35.181455), (106.58257999999999, 35.181643000000001), (106.58198299999999, 35.184615000000001), (106.58148, 35.187242999999995), (106.58127999999999, 35.188243), (106.58138, 35.188243), (106.58108, 35.189442999999997), (106.58104, 35.189644000000001), (106.58028, 35.193442999999995), (106.580029, 35.194541000000001), (106.57974399999999, 35.195785999999998), (106.579475, 35.196961999999999), (106.57922699999999, 35.198042999999998), (106.578397, 35.201665999999996), (106.57827999999999, 35.201642999999997), (106.57737999999999, 35.201642999999997), (106.57697999999999, 35.201543000000001), (106.56436599999999, 35.200311999999997), (106.56058, 35.199942999999998), (106.56048, 35.197342999999996), (106.56048, 35.195842999999996), (106.56048, 35.194342999999996), (106.56048, 35.193142999999999), (106.56048, 35.191873999999999), (106.56048, 35.191742999999995), (106.56048, 35.190242999999995), (106.56037999999999, 35.188642999999999), (106.56037999999999, 35.187242999999995), (106.56037999999999, 35.186842999999996), (106.56037999999999, 35.186552999999996), (106.56037999999999, 35.185842999999998), (106.56037999999999, 35.184443000000002), (106.56037999999999, 35.182943000000002), (106.56037999999999, 35.181342999999998), (106.56037999999999, 35.180433000000001), (106.56037999999999, 35.179943000000002), (106.56037999999999, 35.178542999999998), (106.56037999999999, 35.177790999999999), (106.56037999999999, 35.177143999999998), (106.56037999999999, 35.175643999999998), (106.56037999999999, 35.174444000000001), (106.56037999999999, 35.174043999999995), (106.560526, 35.174043999999995), (106.56478, 35.174043999999995), (106.56627999999999, 35.174143999999998), (106.566541, 35.174144999999996), (106.569023, 35.174157000000001), (106.56917199999999, 35.174157999999998), (106.56938, 35.174143999999998), (106.57061499999999, 35.174143999999998), (106.57097999999999, 35.174143999999998), (106.57679999999999, 35.174143999999998), (106.57798, 35.174143999999998)] >>> is_clockwise(v) True

pysal.cg.standalone.
point_touches_rectangle
(point, rect)[source]¶ Returns True if the point is in the rectangle or touches it’s boundary.
point_touches_rectangle(point, rect) > bool
Parameters: Examples
>>> rect = Rectangle(0,0,10,10) >>> a = Point((5,5)) >>> b = Point((10,5)) >>> c = Point((11,11)) >>> point_touches_rectangle(a,rect) 1 >>> point_touches_rectangle(b,rect) 1 >>> point_touches_rectangle(c,rect) 0
Returns the line segments in common to both polygons.
get_shared_segments(poly1, poly2) > list
Parameters:  poly1 (a Polygon) –
 poly2 (a Polygon) –
Examples
>>> x = [0, 0, 1, 1] >>> y = [0, 1, 1, 0] >>> poly1 = Polygon( map(Point,zip(x,y)) ) >>> x = [a+1 for a in x] >>> poly2 = Polygon( map(Point,zip(x,y)) ) >>> get_shared_segments(poly1, poly2, bool_ret=True) True

pysal.cg.standalone.
distance_matrix
(X, p=2.0, threshold=50000000.0)[source]¶ Distance Matrices
XXX Needs optimization/integration with other weights in pysal
Parameters:  X (An, n by k numpy.ndarray) – Where n is number of observations k is number of dimmensions (2 for x,y)
 p (float) – Minkowski pnorm distance metric parameter: 1<=p<=infinity 2: Euclidean distance 1: Manhattan distance
 threshold (positive integer) – If (n**2)*32 > threshold use scipy.spatial.distance_matrix instead of working in ram, this is roughly the ammount of ram (in bytes) that will be used.
Examples
>>> x,y=[r.flatten() for r in np.indices((3,3))] >>> data = np.array([x,y]).T >>> d=distance_matrix(data) >>> np.array(d) array([[ 0. , 1. , 2. , 1. , 1.41421356, 2.23606798, 2. , 2.23606798, 2.82842712], [ 1. , 0. , 1. , 1.41421356, 1. , 1.41421356, 2.23606798, 2. , 2.23606798], [ 2. , 1. , 0. , 2.23606798, 1.41421356, 1. , 2.82842712, 2.23606798, 2. ], [ 1. , 1.41421356, 2.23606798, 0. , 1. , 2. , 1. , 1.41421356, 2.23606798], [ 1.41421356, 1. , 1.41421356, 1. , 0. , 1. , 1.41421356, 1. , 1.41421356], [ 2.23606798, 1.41421356, 1. , 2. , 1. , 0. , 2.23606798, 1.41421356, 1. ], [ 2. , 2.23606798, 2.82842712, 1. , 1.41421356, 2.23606798, 0. , 1. , 2. ], [ 2.23606798, 2. , 2.23606798, 1.41421356, 1. , 1.41421356, 1. , 0. , 1. ], [ 2.82842712, 2.23606798, 2. , 2.23606798, 1.41421356, 1. , 2. , 1. , 0. ]]) >>>
cg.rtree
— rtree¶
The cg.rtree
module provides a pure python rtree.
New in version 1.2.
Pure Python implementation of RTree spatial index
Adaptation of http://code.google.com/p/pyrtree/
Rtree. see doc/ref/rtreeclusteringsplitalgo.pdf
cg.kdtree
— KDTree¶
The cg.kdtree
module provides kdtree data structures for PySAL.
New in version 1.3.
KDTree for PySAL: Python Spatial Analysis Library.
Adds support for Arc Distance to scipy.spatial.KDTree.

pysal.cg.kdtree.
KDTree
(data, leafsize=10, distance_metric='Euclidean', radius=6371.0)[source]¶ kdtree built on top of kdtree functionality in scipy. If using scipy 0.12 or greater uses the scipy.spatial.cKDTree, otherwise uses scipy.spatial.KDTree. Offers both Arc distance and Euclidean distance. Note that Arc distance is only appropriate when points in latitude and longitude, and the radius set to meaningful value (see docs below).
Parameters:  data (array) – The data points to be indexed. This array is not copied, and so modifying this data will result in bogus results. Typically nx2.
 leafsize (int) – The number of points at which the algorithm switches over to bruteforce. Has to be positive. Optional, default is 10.
 distance_metric (string) – Options: “Euclidean” (default) and “Arc”.
 radius (float) – Radius of the sphere on which to compute distances. Assumes data in latitude and longitude. Ignored if distance_metric=”Euclidean”. Typical values: pysal.cg.RADIUS_EARTH_KM (default) pysal.cg.RADIUS_EARTH_MILES
cg.sphere
— Sphere¶
The cg.sphere
module provides tools for working with spherical distances.
New in version 1.3.
sphere: Tools for working with spherical geometry.
 Author(s):
 Charles R Schmidt schmidtc@gmail.com Luc Anselin luc.anselin@asu.edu Xun Li xun.li@asu.edu

pysal.cg.sphere.
arcdist
(pt0, pt1, radius=6371.0)[source]¶ Parameters:  pt0 (point) – assumed to be in form (lng,lat)
 pt1 (point) – assumed to be in form (lng,lat)
 radius (radius of the sphere) –
defaults to Earth’s radius
Source: http://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html
Returns: Return type: The arc distance between pt0 and pt1 using supplied radius
Examples
>>> pt0 = (0,0) >>> pt1 = (180,0) >>> d = arcdist(pt0,pt1,RADIUS_EARTH_MILES) >>> d == math.pi*RADIUS_EARTH_MILES True

pysal.cg.sphere.
arcdist2linear
(arc_dist, radius=6371.0)[source]¶ Convert an arc distance (spherical earth) to a linear distance (R3) in the unit sphere.
Examples
>>> pt0 = (0,0) >>> pt1 = (180,0) >>> d = arcdist(pt0,pt1,RADIUS_EARTH_MILES) >>> d == math.pi*RADIUS_EARTH_MILES True >>> arcdist2linear(d,RADIUS_EARTH_MILES) 2.0

pysal.cg.sphere.
fast_knn
(pts, k, return_dist=False)[source]¶ Computes k nearest neighbors on a sphere.
Parameters: Returns:  wn (list) – list of neighbors
 wd (list) – list of neighbor distances (optional)

pysal.cg.sphere.
linear2arcdist
(linear_dist, radius=6371.0)[source]¶ Convert a linear distance in the unit sphere (R3) to an arc distance based on supplied radius
Examples
>>> pt0 = (0,0) >>> pt1 = (180,0) >>> d = arcdist(pt0,pt1,RADIUS_EARTH_MILES) >>> d == linear2arcdist(2.0, radius = RADIUS_EARTH_MILES) True

pysal.cg.sphere.
toXYZ
(pt)[source]¶ Parameters:  pt0 (point) – assumed to be in form (lng,lat)
 pt1 (point) – assumed to be in form (lng,lat)
Returns: Return type:

pysal.cg.sphere.
lonlat
(pointslist)[source]¶ Converts point order from latlon tuples to lonlat (x,y) tuples
Parameters: pointslist (list of latlon tuples (Note, has to be a list, even for one point)) – Returns: newpts Return type: list with tuples of points in lonlat order Example
>>> points = [(41.981417, 87.893517), (41.980396, 87.776787), (41.980906, 87.696450)] >>> newpoints = lonlat(points) >>> newpoints [(87.893517, 41.981417), (87.776787, 41.980396), (87.69645, 41.980906)]

pysal.cg.sphere.
harcdist
(p0, p1, lonx=True, radius=6371.0)[source]¶ Alternative arc distance function, uses haversine formula
Parameters:  p0 (first point as a tuple in decimal degrees) –
 p1 (second point as a tuple in decimal degrees) –
 lonx (boolean to assess the order of the coordinates,) – for lon,lat (default) = True, for lat,lon = False
 radius (radius of the earth at the equator as a sphere) –
default: RADIUS_EARTH_KM (6371.0 km) options: RADIUS_EARTH_MILES (3959.0 miles)
None (for result in radians)
Returns: d
Return type: distance in units specified, km, miles or radians (for None)
Example
>>> p0 = (87.893517, 41.981417) >>> p1 = (87.519295, 41.657498) >>> harcdist(p0,p1) 47.52873002976876 >>> harcdist(p0,p1,radius=None) 0.007460167953189258
Note
Uses radangle function to compute radian angle

pysal.cg.sphere.
geointerpolate
(p0, p1, t, lonx=True)[source]¶ Finds a point on a sphere along the great circle distance between two points on a sphere also known as a way point in great circle navigation
Parameters:  p0 (first point as a tuple in decimal degrees) –
 p1 (second point as a tuple in decimal degrees) –
 t (proportion along great circle distance between p0 and p1) – e.g., t=0.5 would find the midpoint
 lonx (boolean to assess the order of the coordinates,) – for lon,lat (default) = True, for lat,lon = False
Returns: x,y – depending on setting of lonx; in other words, the same order is used as for the input
Return type: tuple in decimal degrees of lonlat (default) or latlon,
Example
>>> p0 = (87.893517, 41.981417) >>> p1 = (87.519295, 41.657498) >>> geointerpolate(p0,p1,0.1) # using lonlat (87.85592403438788, 41.949079912574796) >>> p3 = (41.981417, 87.893517) >>> p4 = (41.657498, 87.519295) >>> geointerpolate(p3,p4,0.1,lonx=False) # using latlon (41.949079912574796, 87.85592403438788)

pysal.cg.sphere.
geogrid
(pup, pdown, k, lonx=True)[source]¶ Computes a k+1 by k+1 set of grid points for a bounding box in latlon uses geointerpolate
Parameters:  pup (tuple with latlon or lonlat for upper left corner of bounding box) –
 pdown (tuple with latlon or lonlat for lower right corner of bounding box) –
 k (number of grid cells (grid points will be one more)) –
 lonx (boolean to assess the order of the coordinates,) – for lon,lat (default) = True, for lat,lon = False
Returns: grid – starting with the top row and moving to the bottom; coordinate tuples are returned in same order as input
Return type: list of tuples with latlon or lonlat for grid points, row by row,
Example
>>> pup = (42.023768,87.946389) # Arlington Heights IL >>> pdown = (41.644415,87.524102) # Hammond, IN >>> geogrid(pup,pdown,3,lonx=False) [(42.023768, 87.946389), (42.02393997819538, 87.80562679358316), (42.02393997819538, 87.66486420641684), (42.023768, 87.524102), (41.897317, 87.94638900000001), (41.8974888973743, 87.80562679296166), (41.8974888973743, 87.66486420703835), (41.897317, 87.524102), (41.770866000000005, 87.94638900000001), (41.77103781320412, 87.80562679234043), (41.77103781320412, 87.66486420765956), (41.770866000000005, 87.524102), (41.644415, 87.946389), (41.64458672568646, 87.80562679171955), (41.64458672568646, 87.66486420828045), (41.644415, 87.524102)]
pysal.core
— Core Data Structures and IO¶
Tables
– DataTable Extension¶
New in version 1.0.

class
pysal.core.Tables.
DataTable
(*args, **kwargs)[source]¶ DataTable provides additional functionality to FileIO for data table file tables FileIO Handlers that provide data tables should subclass this instead of FileIO

by_col
¶

by_col_array
(*args)[source]¶ Return columns of table as a numpy array.
Parameters: *args (any number of strings of length k) – names of variables to extract Returns: implicit Return type: numpy array of shape (n,k) Notes
If the variables are not all of the same data type, then numpy rules for casting will result in a uniform type applied to all variables.
If only strings are passed to the function, then an array with those columns will be constructed.
If only one list of strings is passed, the output is identical to those strings being passed.
If at least one list is passed and other strings or lists are passed, this returns a tuple containing arrays constructed from each positional argument.
Examples
>>> import pysal as ps >>> dbf = ps.open(ps.examples.get_path('NAT.dbf')) >>> hr = dbf.by_col_array('HR70', 'HR80') >>> hr[0:5] array([[ 0. , 8.85582713], [ 0. , 17.20874204], [ 1.91515848, 3.4507747 ], [ 1.28864319, 3.26381409], [ 0. , 7.77000777]]) >>> hr = dbf.by_col_array(['HR80', 'HR70']) >>> hr[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]]) >>> hr = dbf.by_col_array(['HR80']) >>> hr[0:5] array([[ 8.85582713], [ 17.20874204], [ 3.4507747 ], [ 3.26381409], [ 7.77000777]])
Numpy only supports homogeneous arrays. See Notes above.
>>> hr = dbf.by_col_array('STATE_NAME', 'HR80') >>> hr[0:5] array([['Minnesota', '8.8558271343'], ['Washington', '17.208742041'], ['Washington', '3.4507746989'], ['Washington', '3.2638140931'], ['Washington', '7.77000777']], dtype='S20')
>>> y, X = dbf.by_col_array('STATE_NAME', ['HR80', 'HR70']) >>> y[0:5] array([['Minnesota'], ['Washington'], ['Washington'], ['Washington'], ['Washington']], dtype='S20') >>> X[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]])

FileIO
– File Input/Output System¶
New in version 1.0.
FileIO: Module for reading and writing various file types in a Pythonic way. This module should not be used directly, instead... import pysal.core.FileIO as FileIO Readers and Writers will mimic python file objects. .seek(n) seeks to the n’th object .read(n) reads n objects, default == all .next() reads the next object

class
pysal.core.FileIO.
FileIO
(dataPath='', mode='r', dataFormat=None)[source]¶ How this works: FileIO.open(*args) == FileIO(*args) When creating a new instance of FileIO the .__new__ method intercepts .__new__ parses the filename to determine the fileType next, .__registry and checked for that type. Each type supports one or more modes [‘r’,’w’,’a’,etc] If we support the type and mode, an instance of the appropriate handler is created and returned. All handlers must inherit from this class, and by doing so are automatically added to the .__registry and are forced to conform to the prescribed API. The metaclass takes cares of the registration by parsing the class definition. It doesn’t make much sense to treat weights in the same way as shapefiles and dbfs, ....for now we’ll just return an instance of W on mode=’r’ .... on mode=’w’, .write will expect an instance of W

by_row
¶

get
(n)[source]¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

static
getType
(dataPath, mode, dataFormat=None)[source]¶ Parse the dataPath and return the data type

ids
¶

rIds
¶

read
(n=1)[source]¶ Read at most n objects, less if read hits EOF if size is negative or omitted read all objects until EOF returns None if EOF is reached before any objects.

pysal.core.IOHandlers
— Input Output Handlers¶
IOHandlers.arcgis_dbf
– ArcGIS DBF plugin¶New in version 1.2.

class
pysal.core.IOHandlers.arcgis_dbf.
ArcGISDbfIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in ArcGIS dbf format.
Spatial weights objects in the ArcGIS dbf format are used in ArcGIS Spatial Statistics tools. This format is the same as the general dbf format, but the structure of the weights dbf file is fixed unlike other dbf files. This dbf format can be used with the “Generate Spatial Weights Matrix” tool, but not with the tools under the “Mapping Clusters” category.
The ArcGIS dbf file is assumed to have three or four data columns. When the file has four columns, the first column is meaningless and will be ignored in PySAL during both file reading and file writing. The next three columns hold origin IDs, destinations IDs, and weight values. When the file has three columns, it is assumed that only these data columns exist in the stated order. The name for the orgin IDs column should be the name of ID variable in the original source data table. The names for the destination IDs and weight values columns are NID and WEIGHT, respectively. ArcGIS Spatial Statistics tools support only unique integer IDs. Therefore, the values for origin and destination ID columns should be integer. For the case where the IDs of a weights object are not integers, ArcGISDbfIO allows users to use internal id values corresponding to record numbers, instead of original ids.
An exemplary structure of an ArcGIS dbf file is as follows: [Line 1] Field1 RECORD_ID NID WEIGHT [Line 2] 0 72 76 1 [Line 3] 0 72 79 1 [Line 4] 0 72 78 1 ...
Unlike the ArcGIS text format, this format does not seem to include selfneighbors.
References

FORMATS
= ['arcgis_dbf']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

varName
¶

write
(obj, useIdIndex=False)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  an ArcGIS dbf file
 write a weights object to the opened dbf file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('arcgis_ohio.dbf'),'r','arcgis_dbf') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.dbf')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w','arcgis_dbf')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created text file
>>> wnew = pysal.open(fname,'r','arcgis_dbf').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.arcgis_swm
— ArcGIS SWM plugin¶New in version 1.2.

class
pysal.core.IOHandlers.arcgis_swm.
ArcGISSwmIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in ArcGIS swm format.
Spatial weights objects in the ArcGIS swm format are used in ArcGIS Spatial Statistics tools. Particularly, this format can be directly used with the tools under the category of Mapping Clusters.
The values for [ORG_i] and [DST_i] should be integers, as ArcGIS Spatial Statistics tools support only unique integer IDs. For the case where a weights object uses noninteger IDs, ArcGISSwmIO allows users to use internal ids corresponding to record numbers, instead of original ids.
The specifics of each part of the above structure is as follows.
¶ Part Data type Description Length ID_VAR_NAME ASCII TEXT ID variable name Flexible (Up to the 1st ;) ESRI_SRS ASCII TEXT ESRI spatial reference system Flexible (Btw the 1st ; and n) NO_OBS l.e. int Number of observations 4 ROW_STD l.e. int Whether or not rowstandardized 4 WGT_i ORG_i l.e. int ID of observaiton i 4 NO_NGH_i l.e. int Number of neighbors for obs. i (m) 4 NGHS_i DSTS_i l.e. int IDs of all neighbors of obs. i 4*m WS_i l.e. float Weights for obs. i and its neighbors 8*m W_SUM_i l.e. float Sum of weights for “ 8 
FORMATS
= ['swm']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

varName
¶

write
(obj, useIdIndex=False)[source]¶ Writes a spatial weights matrix data file in swm format.
Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  an ArcGIS swm file
 write a weights object to the opened swm file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('ohio.swm'),'r') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.swm')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created text file
>>> wnew = pysal.open(fname,'r').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.arcgis_txt
– ArcGIS ASCII plugin¶New in version 1.2.

class
pysal.core.IOHandlers.arcgis_txt.
ArcGISTextIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in ArcGIS ASCII text format.
Spatial weights objects in the ArcGIS text format are used in ArcGIS Spatial Statistics tools. This format is a simple text file with ASCII encoding. This format can be directly used with the tools under the category of “Mapping Clusters.” But, it cannot be used with the “Generate Spatial Weights Matrix” tool.
The first line of the ArcGIS text file is a header including the name of a data column that holded the ID variable in the original source data table. After this header line, it includes three data columns for origin id, destination id, and weight values. ArcGIS Spatial Statistics tools support only unique integer ids. Thus, the values in the first two columns should be integers. For the case where a weights object uses noninteger IDs, ArcGISTextIO allows users to use internal ids corresponding to record numbers, instead of original ids.
An exemplary structure of an ArcGIS text file is as follows: [Line 1] StationID [Line 2] 1 1 0.0 [Line 3] 1 2 0.1 [Line 4] 1 3 0.14286 [Line 5] 2 1 0.1 [Line 6] 2 3 0.05 [Line 7] 3 1 0.16667 [Line 8] 3 2 0.06667 [Line 9] 3 3 0.0 ...
As shown in the above example, this file format allows explicit specification of weights for selfneighbors. When no entry is available for selfneighbors, ArcGIS spatial statistics tools consider they have zero weights. PySAL ArcGISTextIO class ignores selfneighbors if their weights are zero.
References
http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Modeling_spatial_relationships
Notes
When there are an dbf file whose name is identical to the name of the source text file, ArcGISTextIO checks the data type of the ID data column and uses it for reading and writing the text file. Otherwise, it considers IDs are strings.

FORMATS
= ['arcgis_text']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

close
()¶

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)¶

seek
(pos)¶

shpName
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

varName
¶

write
(obj, useIdIndex=False)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  an ArcGIS text file
 write a weights object to the opened text file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('arcgis_txt.txt'),'r','arcgis_text') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.txt')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w','arcgis_text')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created text file
>>> wnew = pysal.open(fname,'r','arcgis_text').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.csvWrapper
— CSV plugin¶New in version 1.0.

class
pysal.core.IOHandlers.csvWrapper.
csvWrapper
(*args, **kwargs)[source]¶ DataTable provides additional functionality to FileIO for data table file tables FileIO Handlers that provide data tables should subclass this instead of FileIO

FORMATS
= ['csv']¶

MODES
= ['r', 'Ur', 'rU', 'U']¶

READ_MODES
= ['r', 'Ur', 'rU', 'U']¶

by_col
¶

by_col_array
(*args)¶ Return columns of table as a numpy array.
Parameters: *args (any number of strings of length k) – names of variables to extract Returns: implicit Return type: numpy array of shape (n,k) Notes
If the variables are not all of the same data type, then numpy rules for casting will result in a uniform type applied to all variables.
If only strings are passed to the function, then an array with those columns will be constructed.
If only one list of strings is passed, the output is identical to those strings being passed.
If at least one list is passed and other strings or lists are passed, this returns a tuple containing arrays constructed from each positional argument.
Examples
>>> import pysal as ps >>> dbf = ps.open(ps.examples.get_path('NAT.dbf')) >>> hr = dbf.by_col_array('HR70', 'HR80') >>> hr[0:5] array([[ 0. , 8.85582713], [ 0. , 17.20874204], [ 1.91515848, 3.4507747 ], [ 1.28864319, 3.26381409], [ 0. , 7.77000777]]) >>> hr = dbf.by_col_array(['HR80', 'HR70']) >>> hr[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]]) >>> hr = dbf.by_col_array(['HR80']) >>> hr[0:5] array([[ 8.85582713], [ 17.20874204], [ 3.4507747 ], [ 3.26381409], [ 7.77000777]])
Numpy only supports homogeneous arrays. See Notes above.
>>> hr = dbf.by_col_array('STATE_NAME', 'HR80') >>> hr[0:5] array([['Minnesota', '8.8558271343'], ['Washington', '17.208742041'], ['Washington', '3.4507746989'], ['Washington', '3.2638140931'], ['Washington', '7.77000777']], dtype='S20')
>>> y, X = dbf.by_col_array('STATE_NAME', ['HR80', 'HR70']) >>> y[0:5] array([['Minnesota'], ['Washington'], ['Washington'], ['Washington'], ['Washington']], dtype='S20') >>> X[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]])

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

close
()¶ subclasses should clean themselves up and then call this method

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)¶ Read at most n objects, less if read hits EOF if size is negative or omitted read all objects until EOF returns None if EOF is reached before any objects.

seek
(n)¶ Seek the FileObj to the beginning of the n’th record, if ids are set, seeks to the beginning of the record at id, n

tell
()¶ Return id (or offset) of next object

to_df
(n=1, read_shp=None, **df_kws)¶

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj)¶ Must be implemented by subclasses that support ‘w’ subclasses should increment .pos subclasses should also check if obj is an instance of type(list) and redefine this doc string

IOHandlers.dat
— DAT plugin¶New in version 1.2.

class
pysal.core.IOHandlers.dat.
DatIO
(*args, **kwargs)[source]¶ Opens, reads, and writes file objects in DAT format.
Spatial weights objects in DAT format are used in Dr. LeSage’s MatLab Econ library. This DAT format is a simple text file with DAT or dat extension. Without header line, it includes three data columns for origin id, destination id, and weight values as follows:
[Line 1] 2 1 0.25 [Line 2] 5 1 0.50 ...
Origin/destination IDs in this file format are simply record numbers starting with 1. IDs are not necessarily integers. Data values for all columns should be numeric.

FORMATS
= ['dat']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

close
()¶

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)¶

seek
(pos)¶

shpName
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

varName
¶

write
(obj)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  a DAT file
 write a weights object to the opened DAT file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('wmat.dat'),'r') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.dat')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created dat file
>>> wnew = pysal.open(fname,'r').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.gal
— GAL plugin¶New in version 1.0.

class
pysal.core.IOHandlers.gal.
GalIO
(*args, **kwargs)[source]¶ Opens, reads, and writes file objects in GAL format.

FORMATS
= ['gal']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

data_type
¶

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1, sparse=False)[source]¶  sparse: boolean
 If true return scipy sparse object If false return pysal w object

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  a GAL file
 write a weights object to the opened GAL file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('sids2.gal'),'r') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.gal')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created gal file
>>> wnew = pysal.open(fname,'r').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.geobugs_txt
— GeoBUGS plugin¶New in version 1.2.

class
pysal.core.IOHandlers.geobugs_txt.
GeoBUGSTextIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in the text format used in GeoBUGS. GeoBUGS generates a spatial weights matrix as an R object and writes it out as an ASCII text representation of the R object.
An exemplary GeoBUGS text file is as follows. list([CARD],[ADJ],[WGT],[SUMNUMNEIGH]) where [CARD] and [ADJ] are required but the others are optional. PySAL assumes [CARD] and [ADJ] always exist in an input text file. It can read a GeoBUGS text file, even when its content is not written in the order of [CARD], [ADJ], [WGT], and [SUMNUMNEIGH]. It always writes all of [CARD], [ADJ], [WGT], and [SUMNUMNEIGH]. PySAL does not apply text wrapping during file writing.
In the above example,
 [CARD]:
 num=c([a list of commasplitted neighbor cardinalities])
 [ADJ]:
 adj=c([a list of commasplitted neighbor IDs]) if caridnality is zero, neighbor IDs are skipped. The ordering of observations is the same in both [CARD] and [ADJ]. Neighbor IDs are record numbers starting from one.
 [WGT]:
 weights=c([a list of commasplitted weights]) The restrictions for [ADJ] also apply to [WGT].
 [SUMNUMNEIGH]:
 sumNumNeigh=[The total number of neighbor pairs] the total number of neighbor pairs is an integer value and the same as the sum of neighbor cardinalities.
Notes
For the files generated from R spdep nb2WB and dput function, it is assumed that the value for the control parameter of dput function is NULL. Please refer to R spdep nb2WB function help file.
References
Thomas, A., Best, N., Lunn, D., Arnold, R., and Spiegelhalter, D.
 GeoBUGS User Manual.
R spdep nb2WB function help file.

FORMATS
= ['geobugs_text']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)[source]¶ Reads GeoBUGS text file
Returns: Return type: a pysal.weights.weights.W object Examples
Type ‘dir(w)’ at the interpreter to see what methods are supported. Open a GeoBUGS text file and read it into a pysal weights object
>>> w = pysal.open(pysal.examples.get_path('geobugs_scot'),'r','geobugs_text').read() WARNING: there are 3 disconnected observations Island ids: [6, 8, 11]
Get the number of observations from the header
>>> w.n 56
Get the mean number of neighbors
>>> w.mean_neighbors 4.1785714285714288
Get neighbor distances for a single observation
>>> w[1] {9: 1.0, 19: 1.0, 5: 1.0}

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj)[source]¶ Writes a weights object to the opened text file.
Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns: Return type: a GeoBUGS text file
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('geobugs_scot'),'r','geobugs_text') >>> w = testfile.read() WARNING: there are 3 disconnected observations Island ids: [6, 8, 11]
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w','geobugs_text')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created text file
>>> wnew = pysal.open(fname,'r','geobugs_text').read() WARNING: there are 3 disconnected observations Island ids: [6, 8, 11]
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)
IOHandlers.geoda_txt
– Geoda text plugin¶New in version 1.0.

class
pysal.core.IOHandlers.geoda_txt.
GeoDaTxtReader
(*args, **kwargs)[source]¶ DataTable provides additional functionality to FileIO for data table file tables FileIO Handlers that provide data tables should subclass this instead of FileIO

FORMATS
= ['geoda_txt']¶

MODES
= ['r']¶

by_col
¶

by_col_array
(*args)¶ Return columns of table as a numpy array.
Parameters: *args (any number of strings of length k) – names of variables to extract Returns: implicit Return type: numpy array of shape (n,k) Notes
If the variables are not all of the same data type, then numpy rules for casting will result in a uniform type applied to all variables.
If only strings are passed to the function, then an array with those columns will be constructed.
If only one list of strings is passed, the output is identical to those strings being passed.
If at least one list is passed and other strings or lists are passed, this returns a tuple containing arrays constructed from each positional argument.
Examples
>>> import pysal as ps >>> dbf = ps.open(ps.examples.get_path('NAT.dbf')) >>> hr = dbf.by_col_array('HR70', 'HR80') >>> hr[0:5] array([[ 0. , 8.85582713], [ 0. , 17.20874204], [ 1.91515848, 3.4507747 ], [ 1.28864319, 3.26381409], [ 0. , 7.77000777]]) >>> hr = dbf.by_col_array(['HR80', 'HR70']) >>> hr[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]]) >>> hr = dbf.by_col_array(['HR80']) >>> hr[0:5] array([[ 8.85582713], [ 17.20874204], [ 3.4507747 ], [ 3.26381409], [ 7.77000777]])
Numpy only supports homogeneous arrays. See Notes above.
>>> hr = dbf.by_col_array('STATE_NAME', 'HR80') >>> hr[0:5] array([['Minnesota', '8.8558271343'], ['Washington', '17.208742041'], ['Washington', '3.4507746989'], ['Washington', '3.2638140931'], ['Washington', '7.77000777']], dtype='S20')
>>> y, X = dbf.by_col_array('STATE_NAME', ['HR80', 'HR70']) >>> y[0:5] array([['Minnesota'], ['Washington'], ['Washington'], ['Washington'], ['Washington']], dtype='S20') >>> X[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]])

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)¶ Read at most n objects, less if read hits EOF if size is negative or omitted read all objects until EOF returns None if EOF is reached before any objects.

seek
(n)¶ Seek the FileObj to the beginning of the n’th record, if ids are set, seeks to the beginning of the record at id, n

tell
()¶ Return id (or offset) of next object

to_df
(n=1, read_shp=None, **df_kws)¶

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj)¶ Must be implemented by subclasses that support ‘w’ subclasses should increment .pos subclasses should also check if obj is an instance of type(list) and redefine this doc string

IOHandlers.gwt
— GWT plugin¶New in version 1.0.

class
pysal.core.IOHandlers.gwt.
GwtIO
(*args, **kwargs)[source]¶ 
FORMATS
= ['kwt', 'gwt']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

shpName
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

varName
¶

write
(obj)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  a GWT file
 write a weights object to the opened GWT file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('juvenile.gwt'),'r') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.gwt')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created gwt file
>>> wnew = pysal.open(fname,'r').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.mat
— MATLAB Level 45 plugin¶New in version 1.2.

class
pysal.core.IOHandlers.mat.
MatIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in MATLAB Level 45 MAT format.
MAT files are used in Dr. LeSage’s MATLAB Econometrics library. The MAT file format can handle both full and sparse matrices, and it allows for a matrix dimension greater than 256. In PySAL, row and column headers of a MATLAB array are ignored.
PySAL uses matlab io tools in scipy. Thus, it is subject to all limits that loadmat and savemat in scipy have.
Notes
If a given weights object contains too many observations to write it out as a full matrix, PySAL writes out the object as a sparse matrix.
References
MathWorks (2011) “MATLAB 7 MATFile Format” at http://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf.
scipy matlab io http://docs.scipy.org/doc/scipy/reference/tutorial/io.html

FORMATS
= ['mat']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

varName
¶

write
(obj)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  a MATLAB mat file
 write a weights object to the opened mat file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('spatsymus.mat'),'r') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.mat')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created mat file
>>> wnew = pysal.open(fname,'r').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.mtx
— Matrix Market MTX plugin¶New in version 1.2.

class
pysal.core.IOHandlers.mtx.
MtxIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in Matrix Market MTX format.
The Matrix Market MTX format is used to facilitate the exchange of matrix data. In PySAL, it is being tested as a new file format for delivering the weights information of a spatial weights matrix. Although the MTX format supports both full and sparse matrices with different data types, it is assumed that spatial weights files in the mtx format always use the sparse (or coordinate) format with real data values. For now, no additional assumption (e.g., symmetry) is made of the structure of a weights matrix.
With the above assumptions, the structure of a MTX file containing a spatial weights matrix can be defined as follows: %%MatrixMarket matrix coordinate real general <— header 1 (constant) % Comments starts <— % ....  0 or more comment lines % Comments ends <— M N L <— header 2, rows, columns, entries I1 J1 A(I1,J1) <— ...  L entry lines IL JL A(IL,JL) <—
In the MTX foramt, the index for rows or columns starts with 1.
PySAL uses mtx io tools in scipy. Thus, it is subject to all limits that scipy currently has. Reengineering might be required, since scipy currently reads in the entire entry into memory.
References
MTX format specification http://math.nist.gov/MatrixMarket/formats.html
scipy matlab io http://docs.scipy.org/doc/scipy/reference/tutorial/io.html

FORMATS
= ['mtx']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1, sparse=False)[source]¶  sparse: boolean
 if true, return pysal WSP object if false, return pysal W object

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  a MatrixMarket mtx file
 write a weights object to the opened mtx file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('wmat.mtx'),'r') >>> w = testfile.read()
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.mtx')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created mtx file
>>> wnew = pysal.open(fname,'r').read()
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)
Go to the beginning of the test file
>>> testfile.seek(0)
Create a sparse weights instance from the test file
>>> wsp = testfile.read(sparse=True)
Open the new file in write mode
>>> o = pysal.open(fname,'w')
Write the sparse weights object into the open file
>>> o.write(wsp) >>> o.close()
Read in the newly created mtx file
>>> wsp_new = pysal.open(fname,'r').read(sparse=True)
Compare values from old to new
>>> wsp_new.s0 == wsp.s0 True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.pyDbfIO
– PySAL DBF plugin¶New in version 1.0.

class
pysal.core.IOHandlers.pyDbfIO.
DBF
(*args, **kwargs)[source]¶ PySAL DBF Reader/Writer
This DBF handler implements the PySAL DataTable interface.

header
¶ list – A list of field names. The header is a python list of strings. Each string is a field name and field name must not be longer than 10 characters.

field_spec
¶ list – A list describing the data types of each field. It is comprised of a list of tuples, each tuple describing a field. The format for the tuples is (“Type”,len,precision). Valid Types are ‘C’ for characters, ‘L’ for bool, ‘D’ for data, ‘N’ or ‘F’ for number.
Examples
>>> import pysal >>> dbf = pysal.open(pysal.examples.get_path('juvenile.dbf'), 'r') >>> dbf.header ['ID', 'X', 'Y'] >>> dbf.field_spec [('N', 9, 0), ('N', 9, 0), ('N', 9, 0)]

FORMATS
= ['dbf']¶

MODES
= ['r', 'w']¶

by_col
¶

by_col_array
(*args)¶ Return columns of table as a numpy array.
Parameters: *args (any number of strings of length k) – names of variables to extract Returns: implicit Return type: numpy array of shape (n,k) Notes
If the variables are not all of the same data type, then numpy rules for casting will result in a uniform type applied to all variables.
If only strings are passed to the function, then an array with those columns will be constructed.
If only one list of strings is passed, the output is identical to those strings being passed.
If at least one list is passed and other strings or lists are passed, this returns a tuple containing arrays constructed from each positional argument.
Examples
>>> import pysal as ps >>> dbf = ps.open(ps.examples.get_path('NAT.dbf')) >>> hr = dbf.by_col_array('HR70', 'HR80') >>> hr[0:5] array([[ 0. , 8.85582713], [ 0. , 17.20874204], [ 1.91515848, 3.4507747 ], [ 1.28864319, 3.26381409], [ 0. , 7.77000777]]) >>> hr = dbf.by_col_array(['HR80', 'HR70']) >>> hr[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]]) >>> hr = dbf.by_col_array(['HR80']) >>> hr[0:5] array([[ 8.85582713], [ 17.20874204], [ 3.4507747 ], [ 3.26381409], [ 7.77000777]])
Numpy only supports homogeneous arrays. See Notes above.
>>> hr = dbf.by_col_array('STATE_NAME', 'HR80') >>> hr[0:5] array([['Minnesota', '8.8558271343'], ['Washington', '17.208742041'], ['Washington', '3.4507746989'], ['Washington', '3.2638140931'], ['Washington', '7.77000777']], dtype='S20')
>>> y, X = dbf.by_col_array('STATE_NAME', ['HR80', 'HR70']) >>> y[0:5] array([['Minnesota'], ['Washington'], ['Washington'], ['Washington'], ['Washington']], dtype='S20') >>> X[0:5] array([[ 8.85582713, 0. ], [ 17.20874204, 0. ], [ 3.4507747 , 1.91515848], [ 3.26381409, 1.28864319], [ 7.77000777, 0. ]])

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)¶ Read at most n objects, less if read hits EOF if size is negative or omitted read all objects until EOF returns None if EOF is reached before any objects.

tell
()¶ Return id (or offset) of next object

to_df
(n=1, read_shp=None, **df_kws)¶

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

IOHandlers.pyShpIO
– Shapefile plugin¶The IOHandlers.pyShpIO
Shapefile Plugin for PySAL’s FileIO System
New in version 1.0.
PySAL ShapeFile Reader and Writer based on pure python shapefile module.

class
pysal.core.IOHandlers.pyShpIO.
PurePyShpWrapper
(*args, **kwargs)[source]¶ FileIO handler for ESRI ShapeFiles.
Notes
This class wraps _pyShpIO’s shp_file class with the PySAL FileIO API. shp_file can be used without PySAL.

Formats
¶ list – A list of support file extensions

Modes
¶ list – A list of support file modes
Examples
>>> import tempfile >>> f = tempfile.NamedTemporaryFile(suffix='.shp'); fname = f.name; f.close() >>> import pysal >>> i = pysal.open(pysal.examples.get_path('10740.shp'),'r') >>> o = pysal.open(fname,'w') >>> for shp in i: ... o.write(shp) >>> o.close() >>> open(pysal.examples.get_path('10740.shp'),'rb').read() == open(fname,'rb').read() True >>> open(pysal.examples.get_path('10740.shx'),'rb').read() == open(fname[:1]+'x','rb').read() True >>> import os >>> os.remove(fname); os.remove(fname.replace('.shp','.shx'))

FORMATS
= ['shp', 'shx']¶

MODES
= ['w', 'r', 'wb', 'rb']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

read
(n=1)¶ Read at most n objects, less if read hits EOF if size is negative or omitted read all objects until EOF returns None if EOF is reached before any objects.

seek
(n)¶ Seek the FileObj to the beginning of the n’th record, if ids are set, seeks to the beginning of the record at id, n

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj)¶ Must be implemented by subclasses that support ‘w’ subclasses should increment .pos subclasses should also check if obj is an instance of type(list) and redefine this doc string

IOHandlers.stata_txt
— STATA plugin¶New in version 1.2.

class
pysal.core.IOHandlers.stata_txt.
StataTextIO
(*args, **kwargs)[source]¶ Opens, reads, and writes weights file objects in STATA text format.
Spatial weights objects in the STATA text format are used in STATA sppack library through the spmat command. This format is a simple text file delimited by a whitespace. The spmat command does not specify which file extension to use. But, txt seems the default file extension, which is assumed in PySAL.
The first line of the STATA text file is a header including the number of observations. After this header line, it includes at least one data column that contains unique ids or record numbers of observations. When an id variable is not specified for the original spatial weights matrix in STATA, record numbers are used to identify individual observations, and the record numbers start with one. The spmat command seems to allow only integer IDs, which is also assumed in PySAL.
A STATA text file can have one of the following structures according to its export options in STATA.
Structure 1: encoding using the list of neighbor ids [Line 1] [Number_of_Observations] [Line 2] [ID_of_Obs_1] [ID_of_Neighbor_1_of_Obs_1] [ID_of_Neighbor_2_of_Obs_1] .... [ID_of_Neighbor_m_of_Obs_1] [Line 3] [ID_of_Obs_2] [Line 4] [ID_of_Obs_3] [ID_of_Neighbor_1_of_Obs_3] [ID_of_Neighbor_2_of_Obs_3] ... Note that for island observations their IDs are still recorded.
Structure 2: encoding using a full matrix format [Line 1] [Number_of_Observations] [Line 2] [ID_of_Obs_1] [w_11] [w_12] ... [w_1n] [Line 3] [ID_of_Obs_2] [w_21] [w_22] ... [w_2n] [Line 4] [ID_of_Obs_3] [w_31] [w_32] ... [w_3n] ... [Line n+1] [ID_of_Obs_n] [w_n1] [w_n2] ... [w_nn] where w_ij can be a form of general weight. That is, w_ij can be both a binary value or a general numeric value. If an observation is an island, all of its w columns contains 0.
References
Drukker D.M., Peng H., Prucha I.R., and Raciborski R. (2011) “Creating and managing spatialweighting matrices using the spmat command”
Notes
The spmat command allows users to add any note to a spatial weights matrix object in STATA. However, all those notes are lost when the matrix is exported. PySAL also does not take care of those notes.

FORMATS
= ['stata_text']¶

MODES
= ['r', 'w']¶

by_row
¶

cast
(key, typ)¶ cast key as typ

check
()¶ Prints the contents of the registry

flush
()¶

get
(n)¶ Seeks the file to n and returns n If .ids is set n should be an id, else, n should be an offset

getType
(dataPath, mode, dataFormat=None)¶ Parse the dataPath and return the data type

ids
¶

next
()¶ A FileIO object is its own iterator, see StringIO

open
(*args, **kwargs)¶ Alias for FileIO()

rIds
¶

tell
()¶ Return id (or offset) of next object

truncate
(size=None)¶ Should be implemented by subclasses and redefine this doc string

write
(obj, matrix_form=False)[source]¶ Parameters:  write(weightsObject) –
 a weights object (accepts) –
Returns:  a STATA text file
 write a weights object to the opened text file.
Examples
>>> import tempfile, pysal, os >>> testfile = pysal.open(pysal.examples.get_path('stata_sparse.txt'),'r','stata_text') >>> w = testfile.read() WARNING: there are 7 disconnected observations Island ids: [5, 9, 10, 11, 12, 14, 15]
Create a temporary file for this example
>>> f = tempfile.NamedTemporaryFile(suffix='.txt')
Reassign to new var
>>> fname = f.name
Close the temporary named file
>>> f.close()
Open the new file in write mode
>>> o = pysal.open(fname,'w','stata_text')
Write the Weights object into the open file
>>> o.write(w) >>> o.close()
Read in the newly created text file
>>> wnew = pysal.open(fname,'r','stata_text').read() WARNING: there are 7 disconnected observations Island ids: [5, 9, 10, 11, 12, 14, 15]
Compare values from old to new
>>> wnew.pct_nonzero == w.pct_nonzero True
Clean up temporary file created for this example
>>> os.remove(fname)

IOHandlers.wk1
— Lotus WK1 plugin¶New in version 1.2.

class
pysal.core.IOHandlers.wk1.
Wk1IO
(*args, **kwargs)[source]¶ MATLAB wk1read.m and wk1write.m that were written by Brian M. Bourgault in 10/22/93
Opens, reads, and writes weights file objects in Lotus Wk1 format.
Lotus Wk1 file is used in Dr. LeSage’s MATLAB Econometrics library.
A Wk1 file holds a spatial weights object in a full matrix form without any row and column headers. The maximum number of columns supported in a Wk1 file is 256. Wk1 starts the row (column) number from 0 and uses little endian binary endcoding. In PySAL, when the number of observations is n, it is assumed that each cell of a n*n(=m) matrix either is a blank or have a number.
The internal structure of a Wk1 file written by PySAL is as follows: [BOF][DIM][CPI][CAL][CMODE][CORD][SPLIT][SYNC][CURS][WIN] [HCOL][MRG][LBL][CELL_1]...[CELL_m][EOF] where [CELL_k] equals to [DTYPE][DLEN][DFORMAT][CINDEX][CVALUE]. The parts between [BOF] and [CELL_1] are variable according to the software program used to write a wk1 file. While reading a wk1 file, PySAL ignores them. Each part of this structure is detailed below.
¶