pygtfs - a database backed python gtfs interface!

Get it

The source is available on github: https://github.com/jarondl/pygtfs

Basic usage

To include pygtfs functionality in your application, use import pygtfs.

The first thing you need to to is to create a new schedule object:

sched = pygtfs.Schedule(":memory:")

This will create an in memory sqlite database. Instead you can supply a filename to be used for sqlite (such as (‘gtfs.sqlite’), or a sqlalchemy database connection.

Then you can load gtfs feeds into the databas, by using append:

pygtfs.append_feed(sched, "sample-gtfs-feed.zip")

Where the gtfs feed can be either a .zip file, or a folder full of .txt files. You can add as many feeds as you want into a single database, without fear of conflicts (but you can two stop names for one place, one from each feed for example). Another option to load feeds is to use the ‘gtfs2db’ script as explained later.

The Schedule object represents a collection of objects that correspond to the contents of a GTFS feed. You can get the list of agencies, stops, routes, etc. with fairly straightforwardly named attributes, see pygtfs.schedule for more details.

>>> sched.agencies
[<Agency BART: Bay Area Rapid Transit>, <Agency AirBART: AirBART>]
>>> sched.routes
[<Route AirBART: >, <Route 01: >, <Route 03: >, <Route 05: >, <Route 07: >, <Route 11: >]

For GTFS entities that are identified by a dataset-unique identifier, there is also a function to get them by id:

>>> sched.agencies_by_id('AirBART')
[<Agency AirBART: AirBART>]
>>> sched.stops_by_id('SFIA')
[<Stop SFIA: San Francisco Int'l Airport>]

The GTFS entity objects have attributes that correspond in name to the field definitions in the [GTFS reference](https://developers.google.com/transit/gtfs/reference).

>>> sched.stops_by_id('SFIA')[0].stop_name
u"San Francisco Int'l Airport"
>>> sched.routes[1].route_long_name
u'Pittsburg/Bay Point - SFIA/Millbrae'

GTFS entities which cross-reference each other can also be obtained straightforwardly with attributes (again, see “Reference” below for full details):

>>> sched.trips_by_id('01SFO10').service  # the service associated with trip 01SFO10
<Service WKDY (MTWThFSSu)>

gtfs2db

setup.py install will also install a command-line script gtfs2db that takes a GTFS zip file or directory as an argument and will load the data into a database usable with pygtfs. Run gtfs2db –help for more.

Detailed refernce

The best place to start is pygtfs.schedule

Contents:

The schedule module

class pygtfs.schedule.Schedule(db_connection)[source]

Represents the full database.

The schedule is the most important object in pygtfs. It represents the entire dataset. Most of the properties come straight from the gtfs reference. Two of them were renamed: calendar is called services, and calendar_dates service_exceptions. One addition is the feeds table, which is here to support more than one feed in a database.

Each of the properties is a list created upon access by sqlalchemy. Then, each element of the list as attributes following the gtfs reference. In addition, if they are related to another table, this can also be accessed by attribute.

Parameters:db_conection – Either a sqlalchemy database url or a filename to be used with sqlite.
agencies

A list of pygtfs.gtfs_entities.Agency objects

agencies_by_id(id)

A list of pygtfs.gtfs_entities.Agency objects with matching id

agencies_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Agency objects

drop_feed(feed_id)[source]

Delete a feed from a database by feed id

fare_rules

A list of pygtfs.gtfs_entities.FareRule objects

fare_rules_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.FareRule objects

fares

A list of pygtfs.gtfs_entities.Fare objects

fares_by_id(id)

A list of pygtfs.gtfs_entities.Fare objects with matching id

fares_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Fare objects

feed_infos

A list of pygtfs.gtfs_entities.FeedInfo objects

feed_infos_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.FeedInfo objects

feeds

A list of pygtfs.gtfs_entities.Feed objects

feeds_by_id(id)

A list of pygtfs.gtfs_entities.Feed objects with matching id

feeds_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Feed objects

frequencies

A list of pygtfs.gtfs_entities.Frequency objects

frequencies_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Frequency objects

routes

A list of pygtfs.gtfs_entities.Route objects

routes_by_id(id)

A list of pygtfs.gtfs_entities.Route objects with matching id

routes_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Route objects

service_exceptions

A list of pygtfs.gtfs_entities.ServiceException objects

service_exceptions_by_id(id)

A list of pygtfs.gtfs_entities.ServiceException objects with matching id

service_exceptions_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.ServiceException objects

services

A list of pygtfs.gtfs_entities.Service objects

services_by_id(id)

A list of pygtfs.gtfs_entities.Service objects with matching id

services_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Service objects

shapes

A list of pygtfs.gtfs_entities.ShapePoint objects

shapes_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.ShapePoint objects

stop_times

A list of pygtfs.gtfs_entities.StopTime objects

stop_times_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.StopTime objects

stops

A list of pygtfs.gtfs_entities.Stop objects

stops_by_id(id)

A list of pygtfs.gtfs_entities.Stop objects with matching id

stops_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Stop objects

transfers

A list of pygtfs.gtfs_entities.Transfer objects

transfers_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Transfer objects

translations

A list of pygtfs.gtfs_entities.Translation objects

translations_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Translation objects

trips

A list of pygtfs.gtfs_entities.Trip objects

trips_by_id(id)

A list of pygtfs.gtfs_entities.Trip objects with matching id

trips_query

A sqlalchemy.orm.Query object to fetch pygtfs.gtfs_entities.Trip objects

The loader module

pygtfs.loader.append_feed(schedule, feed_filename, strip_fields=True, chunk_size=5000, agency_id_override=None)[source]
pygtfs.loader.delete_feed(schedule, feed_filename, interactive=False)[source]
pygtfs.loader.list_feeds(schedule)[source]
pygtfs.loader.overwrite_feed(schedule, feed_filename, *args, **kwargs)[source]

GTFS entities

GTFS entities.

These are the entities returned by the various pygtfs.schedule lists. Most of the attributes come directly from the gtfs reference. Also, when possible relations are taken into account, e.g. a Route class has a trips attribute, with a list of trips for the specific route.

class pygtfs.gtfs_entities.Agency(**kwargs)[source]
agency_email
agency_fare_url
agency_id
agency_lang
agency_name
agency_phone
agency_timezone
agency_url
feed_id
id
routes
class pygtfs.gtfs_entities.Fare(**kwargs)[source]
agency_id
currency_type
fare_id
feed_id
id
payment_method
price
transfer_duration
transfers
class pygtfs.gtfs_entities.FareRule(**kwargs)[source]
contains_id
destination_id
fare_id
feed_id
origin_id
route_id
class pygtfs.gtfs_entities.Feed(**kwargs)[source]
agencies
fare_rules
fares
feed_append_date
feed_id
feed_name
feedinfo
frequencies
id
routes
service_exceptions
services
shape_points
stop_times
stops
transfers
translations
trips
class pygtfs.gtfs_entities.FeedInfo(**kwargs)[source]
feed_end_date
feed_id
feed_lang
feed_publisher_name
feed_publisher_url
feed_start_date
feed_version
class pygtfs.gtfs_entities.Frequency(**kwargs)[source]
end_time
exact_times
feed_id
headway_secs
start_time
trip_id
class pygtfs.gtfs_entities.Route(**kwargs)[source]
agency_id
fare_rules
feed_id
id
route_color
route_desc
route_id
route_long_name
route_short_name
route_text_color
route_type
route_url
trips
valid_extended_route_types = [0, 1, 2, 3, 4, 5, 6, 7, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 300, 400, 401, 402, 403, 404, 405, 500, 600, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 800, 900, 901, 902, 903, 904, 905, 906, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1200, 1300, 1301, 1302, 1303, 1304, 1305, 1306, 1307, 1400, 1401, 1402, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1600, 1601, 1602, 1603, 1604, 1700, 1701, 1702]
class pygtfs.gtfs_entities.Service(**kwargs)[source]
end_date
feed_id
friday
id
monday
saturday
service_id
start_date
sunday
thursday
trips
tuesday
wednesday
class pygtfs.gtfs_entities.ServiceException(**kwargs)[source]
date
exception_type
feed_id
id
service_id
class pygtfs.gtfs_entities.ShapePoint(**kwargs)[source]
feed_id
shape_dist_traveled
shape_id
shape_pt_lat
shape_pt_lon
shape_pt_sequence
trips
class pygtfs.gtfs_entities.Stop(**kwargs)[source]
feed_id
id
location_type
parent_station
platform_code
stop_code
stop_desc
stop_id
stop_lat
stop_lon
stop_name
stop_times
stop_timezone
stop_url
transfers_from
transfers_to
translations
wheelchair_boarding
zone_id
class pygtfs.gtfs_entities.StopTime(**kwargs)[source]
arrival_time
departure_time
drop_off_type
feed_id
pickup_type
shape_dist_traveled
stop_headsign
stop_id
stop_sequence
timepoint
trip_id
class pygtfs.gtfs_entities.Transfer(**kwargs)[source]
feed_id
from_stop_id
min_transfer_time
to_stop_id
transfer_type
class pygtfs.gtfs_entities.Translation(**kwargs)[source]
feed_id
lang
trans_id
translation
class pygtfs.gtfs_entities.Trip(**kwargs)[source]
bikes_allowed
block_id
direction_id
feed_id
frequencies
id
route_id
service_id
shape_id
stop_times
trip_headsign
trip_id
trip_short_name
wheelchair_accessible
pygtfs.gtfs_entities.create_foreign_keys(*key_names)[source]

Create foreign key constraints, always including feed_id, and relying on convention that key name is the same

The gtfs2db script

This is a script to manage the database. Here is its help message:

 gtfs2db - convert a gtfs feed to a pygtfs database

Usage:
  gtfs2db append <feed_file> <database> [--chunk-size <integer>]
  gtfs2db overwrite <feed_file> <database> [-i, --interactive] [--chunk-size <integer>]
  gtfs2db delete <feed_file> <database> [-i, --interactive]
  gtfs2db list <database>
  gtfs2db (-h | --help)
  gtfs2db --version

Options:
  -h --help          Show this help screen.
  --version          Show version.
  -i --interactive   Ask before deleting or overwriting existing feeds.
  --chunk-size <int> How often to flush database. If memory consumption is high,
                     lower this number. [default: 10000]
  <feed_file>        The gtfs file on which to operate. Can be either a folder
                     containing .txt files, or a .zip file.
  <database>         The database. Can be either a file, which is interpreted
                     as an sqlite database stored in this file, or a sqlalchemy
                     database connection.

Commands:
  append            appends the gtfs feed to the database
  overwrite         delete any existing feeds which had the same original
                    filename as the new file, and then append the new file.
  delete            delete from the database any feeds with the name supplied.
  list              list existing feeds in the database.

Description:
  This is a tool to manage a database containing several gtfs feeds. The
  database is in a pygtfs 0.1.0 format, and can be stored as any database
  supported by sqlalchemy (the default being sqlite).
  The database file can later be used to create a `pygtfs.Schedule` instance.

Internally used modules

class pygtfs.feed.CSV(rows, feedtype='CSVTuple', columns=None)[source]

A CSV file.

next()
class pygtfs.feed.Feed(filename, strip_fields=True)[source]

A collection of CSV files with headers, either zipped into an archive or loose in a folder.

python2_reader(filename)[source]
python3_reader(filename)[source]
read_table(filename, columns)[source]
pygtfs.feed.derive_feed_name(filename)[source]

Indices and tables