EEmeter: tools for calculating metered energy savings

Build Status License Documentation Status PyPI Version Code Coverage Status Code Style

EEmeter — an open source toolkit for implementing and developing standard methods for calculating normalized metered energy consumption (NMEC) and avoided energy use.

Background - why use the EEMeter library

At time of writing (Sept 2018), the OpenEEmeter, as implemented in the eemeter package and sister eeweather package, contains the most complete open source implementation of the CalTRACK Methods, which specify a family of ways to calculate and aggregate estimates avoided energy use at a single meter particularly suitable for use in pay-for-performance (P4P) programs.

The eemeter package contains a toolkit written in the python langage which may help in implementing a CalTRACK compliant analysis (see CalTRACK Compliance). It contains a modular set of of functions, parameters, and classes which can be configured to run the CalTRACK methods and close variants.

Note

Please keep in mind that use of the OpenEEmeter is neither necessary nor sufficient for compliance with the CalTRACK method specification. For example, while the CalTRACK methods set specific hard limits for the purpose of standardization and consistency, the EEmeter library can be configured to edit or entirely ignore those limits. This is becuase the eemeter package is used not only for compliance with, but also for development of the CalTRACK methods.

Please also keep in mind that the EEmeter assumes that certain data cleaning tasks specified in the CalTRACK methods have occurred prior to usage with the eemeter. The package proactively exposes warnings to point out issues of this nature where possible.

Installation

EEmeter is a python package and can be installed with pip.

$ pip install eemeter

Note

If you are having trouble installing, see Using with Anaconda.

Features

  • Candidate model selection
  • Data sufficiency checking
  • Reference implementation of standard methods
    • CalTRACK Daily Method
    • CalTRACK Monthly Billing Method
    • CalTRACK Hourly Method
  • Flexible sources of temperature data. See EEweather.
  • Model serialization
  • First-class warnings reporting
  • Pandas DataFrame support
  • Visualization tools

Roadmap for 2020 development

The OpenEEmeter project growth goals for the year fall into two categories:

  1. Community goals - we want help our community thrive and continue to grow.
  2. Technical goals - we want to keep building the library in new ways that make it as easy as possible to use.

Community goals

  1. Develop project documentation and tutorials

A number of users have expressed how hard it is to get started when tutorials are out of date. We will dedicate time and energy this year to help create high quality tutorials that build upon the API documentation and existing tutorials.

  1. Make it easier to contribute

As our user base grows, the need and desire for users to contribute back to the library also grows, and we want to make this as seamless as possible. This means writing and maintaining contribution guides, and creating checklists to guide users through the process.

Technical goals

  1. Implement new CalTRACK recommendations

The CalTRACK process continues to improve the underlying methods used in the OpenEEmeter. Our primary technical goal is to keep up with these changes and continue to be a resource for testing and experimentation during the CalTRACK methods setting process.

  1. Hourly model visualizations

The hourly methods implemented in the OpenEEMeter library are not yet packaged with high quality visualizations like the daily and billing methods are. As we build and package new visualizations with the library, more users will be able to understand, deploy, and contribute to the hourly methods.

  1. Weather normal and unusual scenarios

The EEweather package, which supports the OpenEEmeter, comes packaged with publicly available weather normal scenarios, but one feature that could help make that easier would be to package methods for creating custom weather year scenarios.

  1. Greater weather coverage

The weather station coverage in the EEweather package includes full coverage of US and Australia, but with some technical work, it could be expanded to include greater, or even worldwide coverage.

Usage Guides

Basic Usage

Loading sample data

EEMeter comes packages with some simulated sample data.

Note

This data is not to be used for methods testing! It is designed to have obvious (but completely unrealistic) behavior to showcase building temperature response.

See a list of available sample data files, use eemeter.samples:

>>> eemeter.samples()
['il-electricity-cdd-hdd-hourly',
 'il-electricity-cdd-hdd-daily',
 'il-electricity-cdd-hdd-billing_monthly',
 'il-electricity-cdd-hdd-billing_bimonthly',
 'il-electricity-cdd-only-hourly',
 'il-electricity-cdd-only-daily',
 'il-electricity-cdd-only-billing_monthly',
 'il-electricity-cdd-only-billing_bimonthly',
 'il-gas-hdd-only-hourly',
 'il-gas-hdd-only-daily',
 'il-gas-hdd-only-billing_monthly',
 'il-gas-hdd-only-billing_bimonthly',
 'il-gas-intercept-only-hourly',
 'il-gas-intercept-only-daily',
 'il-gas-intercept-only-billing_monthly',
 'il-gas-intercept-only-billing_bimonthly']

Load meter data, temperature data, and metadata, use eemeter.load_sample:

>>> meter_data, temperature_data, metadata = \
...     eemeter.load_sample('il-electricity-cdd-hdd-daily')
>>> meter_data.head()
                           value
start
2015-11-22 00:00:00+00:00  32.34
2015-11-23 00:00:00+00:00  23.80
2015-11-24 00:00:00+00:00  26.26
2015-11-25 00:00:00+00:00  21.32
2015-11-26 00:00:00+00:00   6.70
>>> temperature_data.head()
dt
2015-11-22 06:00:00+00:00    21.01
2015-11-22 07:00:00+00:00    20.35
2015-11-22 08:00:00+00:00    19.38
2015-11-22 09:00:00+00:00    19.02
2015-11-22 10:00:00+00:00    17.82
Name: tempF, dtype: float64

The metadata dict contains simulated project ground truth, such as roughly expected disaggregated annual usage, savings, and project dates.

Loading data from CSV

Default meter data CSV format:

start,value
2015-11-22T00:00:00+00:00,32.34
2015-11-23T00:00:00+00:00,23.80
2015-11-24T00:00:00+00:00,26.26
2015-11-25T00:00:00+00:00,21.32
2015-11-26T00:00:00+00:00,6.70
...

To load meter data from a CSV, use eemeter.meter_data_from_csv:

>>> meter_data = eemeter.meter_data_from_csv(f)  # file handle

The eemeter.meter_data_from_csv has lots of configurable options for data that is formatted differently! Check out the API docs for more info.

Default temperature data CSV format:

dt,tempF
2015-11-22T00:00:00+06:00,21.01
2015-11-22T01:00:00+06:00,20.35
2015-11-22T02:00:00+06:00,19.38
2015-11-22T03:00:00+06:00,19.02
2015-11-22T04:00:00+06:00,17.82
...

To load temperature data from a CSV, use eemeter.temperature_data_from_csv. (See also EEweather):

>>> temperature_data = eemeter.temperature_data_from_csv(f)  # file handle

The eemeter.temperature_data_from_csv also has lots of configurable options for data that is formatted differently! Check out the API docs for more info.

These methods also work with gzipped files (e.g., the sample data):

>>> meter_data = eemeter.meter_data_from_csv(f, gzipped=True)

If frequency is known ('hourly', 'daily'), this will load that data with an index of the appropriate frequency. This helps the data formatting methods do the right thing.

>>> daily_meter_data = eemeter.meter_data_from_csv(f, freq='daily')

Creating design matrix datasets

To create a design matrix, use one of the following functions:

For example:

>>> meter_data, temperature_data, metadata = \
...     eemeter.load_sample('il-electricity-cdd-hdd-daily')
>>> data = eemeter.create_caltrack_daily_design_matrix(meter_data, temperature_data)

Running Daily and Billing CalTRACK methods

Note

For complete compliance with CalTRACK methods, please ensure that input data meets requirements in section 2.1 of the CalTRACK methods specification and uses settings defined in CalTRACK Compliance.

To run the CalTRACK daily or billing methods, you need a pandas.DataFrame with the following columns:

  • meter_value: Daily average metered usage values for each point.
  • cdd_<cooling_balance_point>: Average period daily cooling degree days for a particular cooling balance point.
  • hdd_<heating_balance_point>: Average period daily heating degree days for a particular heating balance point.

For each balance point you want to include in the grid search, you must provide a separate cdd_<> or hdd_<> column.

Armed with a design matrix of the form created above, you can use eemeter.fit_caltrack_usage_per_day_model to fit a model.

You may also wish to filter your data to a baseline period or a reporting period. To do so, use eemeter.get_baseline_data or eemeter.get_reporting_data. For example:

>>> import datetime
>>> import pytz
>>> datetime.datetime(2016, 12, 26, 0, 0, tzinfo=pytz.UTC)
>>> baseline_data, warnings = eemeter.get_baseline_data(
...     data, end=baseline_end_date, max_days=365)
>>> print(baseline_data.head())
                           meter_value  cdd_70     hdd_60     hdd_61  \
2015-12-27 00:00:00+00:00        25.55     0.0  18.093333  19.093333
2015-12-28 00:00:00+00:00        26.46     0.0  22.478333  23.478333
2015-12-29 00:00:00+00:00        30.38     0.0  25.003333  26.003333
2015-12-30 00:00:00+00:00        49.82     0.0  29.161667  30.161667
2015-12-31 00:00:00+00:00        34.47     0.0  29.572917  30.572917

                           n_days_dropped  n_days_kept
2015-12-27 00:00:00+00:00             0.0          1.0
2015-12-28 00:00:00+00:00             0.0          1.0
2015-12-29 00:00:00+00:00             0.0          1.0
2015-12-30 00:00:00+00:00             0.0          1.0
2015-12-31 00:00:00+00:00             0.0          1.0

CalTRACK Daily Methods

Running caltrack daily methods is easy once you have the data in the right format. This method returns a eemeter.CalTRACKUsagePerDayModelResults object:

>>> model_results = eemeter.fit_caltrack_usage_per_day_model(baseline_data)

This object can be dumped into a JSON string:

>>> print(json.dumps(model_results.json(), indent=2))

It can be inspected for more detailed information:

>>> model_results.totals_metrics.r_squared_adj
0.7294645737524558

Or plotted (use with eemeter.plot_energy_signature for an overlay on the fitted data):

>>> model_results.plot()

CalTRACK Billing Methods

Running caltrack billing methods:

>>> model_results = eemeter.fit_caltrack_usage_per_day_model(
...     baseline_data, use_billing_presets=True)

It is essential that the data used in the CalTRACK billing methods is average daily period usage (UPDm) and degree day values.

Data with this property is created by default by the eemeter.create_caltrack_billing_design_matrix method.

Using the CLI

The CLI can be used to run the caltrack methods directly against CSV data. To allow users without immediate access to data to get started quickly with the eemeter package, the CLI also allow using sample data that comes with eemeter.

Use CalTRACK methods on sample data:

$ eemeter caltrack --sample=il-electricity-cdd-hdd-daily
Loading sample: il-electricity-cdd-hdd-daily
{
  "status": "SUCCESS",
  "method_name": "caltrack_daily_method",
  "model": {
    "model_type": "cdd_hdd",
    "formula": "meter_value ~ cdd_65 + hdd_55",
    "status": "QUALIFIED",
    "model_params": {
      "intercept": 10.733478866990144,
      "beta_cdd": 2.039525988684711,
      "beta_hdd": 1.0665644257451434,
      "cooling_balance_point": 65,
      "heating_balance_point": 55
    },
    "r_squared_adj": 0.7810065909435654,
    "warnings": []
  },
  "r_squared_adj": 0.7810065909435654,
  "warnings": [],
  "metadata": {},
  "settings": {
    "fit_cdd": true,
    "minimum_non_zero_cdd": 10,
    "minimum_non_zero_hdd": 10,
    "minimum_total_cdd": 20,
    "minimum_total_hdd": 20,
    "beta_cdd_maximum_p_value": 0.1,
    "beta_hdd_maximum_p_value": 0.1
  }
}

Save output:

$ eemeter caltrack --sample=il-electricity-cdd-only-billing_monthly \
--output-file=/path/to/output.json
Loading sample: il-electricity-cdd-only-billing_monthly
Output written: /path/to/output.json

Load custom data (see sample files for example format):

$ eemeter caltrack --meter-file=/path/to/meter/data.csv \
--temperature-file=/path/to/temperature/data.csv

Do not fit CDD models (intended for gas data):

$ eemeter caltrack --sample=il-gas-hdd-only-billing_monthly --no-fit-cdd

To include all candidate models in output:

$ eemeter caltrack --sample=il-electricity-cdd-hdd-daily --show-candidates

Understanding eemeter warnings

The eemeter package tries to give warnings whenever a result is less than perfect. Warnings appear throughout the eemeter and are given in the structure eemeter.EEMeterWarning.

Each warning has the following structure:

  1. A ‘dotted’ hierarchical name (eemeter.EEMeterWarning.qualified_name) summarizing its origin and nature. For example: 'eemeter.caltrack_method.no_candidate_models'
  2. A full description of the error in prose (eemeter.EEMeterWarning.description).
  3. A set of relevant data about the error, such as limits that were passed ( eemeter.EEMeterWarning.data).

Visualization

Plotting results and models.

Plot an energy signature (eemeter.plot_energy_signature):

>>> eemeter.plot_energy_signature(meter_data, temperature_data)
_images/plot_energy_signature.png

Plot a time series of meter data and temperature data (eemeter.plot_time_series):

>>> eemeter.plot_time_series(meter_data, temperature_data)
_images/plot_time_series.png

Plot the selected model and all candidate models (eemeter.CalTRACKUsagePerDayModelResults.plot) on top of an energy signature:

>>> ax = eemeter.plot_energy_signature(meter_data, temperature_data)
>>> model_results.plot(ax=ax, with_candidates=True)
_images/plot_model_results.png

Plot a single candidate model (eemeter.CalTRACKUsagePerDayCandidateModel.plot):

>>> model_results.model.plot()
_images/plot_candidate.png

The plot functions are flexible and can take quite a few parameters, including custom titles, labels, Matplotlib Axes, and color options.

Obtaining weather data

Weather data can be obtained using the EEweather package.

Definitely check out the full docs, but here’s a taste of what that’s like.

Installation:

$ pip install eeweather

Usage:

>>> import eeweather
>>> result = eeweather.match_lat_long(35, -95)
>>> result
ISDStationMapping('722178')
>>> result.distance_meters
34672
>>> station = result.isd_station
>>> station
ISDStation('722178')
>>> import datetime
>>> import pytz
>>> start_date = datetime.datetime(2016, 6, 1, tzinfo=pytz.UTC)
>>> end_date = datetime.datetime(2017, 9, 15, tzinfo=pytz.UTC)
>>> tempC = station.load_isd_hourly_temp_data(start_date, end_date)
>>> tempC.head()
2016-06-01 00:00:00+00:00    28.291500
2016-06-01 01:00:00+00:00    27.438500
2016-06-01 02:00:00+00:00    27.197083
2016-06-01 03:00:00+00:00    26.898750
2016-06-01 04:00:00+00:00    26.701810
Freq: H, dtype: float64
>>> tempF = tempC * 1.8 + 32
>>> tempF.head()
2016-06-01 00:00:00+00:00    82.924700
2016-06-01 01:00:00+00:00    81.389300
2016-06-01 02:00:00+00:00    80.954750
2016-06-01 03:00:00+00:00    80.417750
2016-06-01 04:00:00+00:00    80.063259

Using with Anaconda

Some users find that the easiest way to get a working Python distributions is to use Anaconda (hello Windows users!). Anaconda is a free distribution of Python that comes with all of the dependencies of eemeter and a host of other useful scientific packages.

If that sounds appealing to you, please follow the installation instructions that Anaconda provides, then come back here armed with a shiny new python distribution and install eemeter at an anaconda shell with $ pip install eemeter.

Advanced Usage

CalTRACK method options

TODO. For now, see CalTRACK for full set of options.

CalTRACK Data Sufficiency Criteria

Compute data sufficiency (eemeter.caltrack_sufficiency_criteria):

>>> data_sufficiency = eemeter.caltrack_sufficiency_criteria(data)

About the CalTRACK methods

The eemeter library is the reference implementation of the CalTRACK methods, but it is not the CalTRACK methods. CalTRACK refers to the methods themselves, for which the documentation is not kept in this repository. The most current information about methods and proposed changes can be found on github.

How the models work

We’re planning a deeper dive on the methods here, but for now, see openee.io, dig into the CalTRACK (try viewing the source link), or try visualizing some the models built with the sample data.

For developers

Contributing

We highly encourage contributing to eemeter. To contribute, please create an issue or a pull request.

Dev installation

We use docker for development. To get started with docker, see docker installation.

Fork and clone a local copy of the repository:

$ git clone git@github.com:YOURUSERNAME/eemeter.git eemeter
$ cd eemeter
$ git remote add upstream git://github.com/openeemeter/eemeter.git

Then try one of the following:

Open a jupyter notebook:

$ docker-compose up jupyter

Build a local version of the docs:

$ docker-compose up docs

Run the tests:

$ docker-compose run --rm test

Open up a shell:

$ docker-compose up shell

Command-line Usage

Once installed, eemeter can be run from the command-line. To see all available commands, run eemeter --help.

Use CalTRACK methods on sample data:

$ eemeter caltrack --sample=il-electricity-cdd-hdd-daily

Save output:

$ eemeter caltrack --sample=il-electricity-cdd-only-billing_monthly --output-file=/path/to/output.json

Load custom data (see eemeter.meter_data_from_csv and eemeter.temperature_data_from_csv for formatting):

$ eemeter caltrack --meter-file=/path/to/meter/data.csv --temperature-file=/path/to/temperature/data.csv

Do not fit CDD-based candidate models (intended for gas data):

$ eemeter caltrack --sample=il-gas-hdd-only-billing_bimonthly --no-fit-cdd

Tutorial

Note

This tutorial assumes you have a working knowledge of the pandas library, a key eemeter dependency. For eemeter installation instructions, see Installation. If you’re new to pandas, the 10 minutes to pandas tutorial is a good primer. We recommend reading that and then coming back here.

Outline

This tutorial is a self-paced walkthrough of how to use the eemeter package. We’ll cover the following:

The tutorial demonstrates how to use the package to run the CalTRACK Hourly, Daily, and Billing methods.

Quickstart

Some folks may just want to see the code all in one place. This code is explained in more detail in the course of the tutorial below. See also CalTRACK Daily and Billing (Usage per Day).

Quickstart for CalTRACK Billing/Daily

Here’s how to run the CalTRACK billing/daily model. See also CalTRACK Daily and Billing (Usage per Day):

import eemeter

meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-daily")
)

# The dates of an analysis "blackout" period during which a project was performed.
# This is synonymous with the CalTRACK "Intervention period" (See CalTRACK 1.4.4)
blackout_start_date = sample_metadata["blackout_start_date"]
blackout_end_date = sample_metadata["blackout_end_date"]

# get meter data suitable for fitting a baseline model
baseline_meter_data, warnings = eemeter.get_baseline_data(
    meter_data, end=blackout_start_date, max_days=365
)

# create a design matrix (the input to the model fitting step)
baseline_design_matrix = eemeter.create_caltrack_daily_design_matrix(
    baseline_meter_data, temperature_data,
)

# build a CalTRACK model
baseline_model = eemeter.fit_caltrack_usage_per_day_model(
    baseline_design_matrix,
)

# get a year of reporting period data
reporting_meter_data, warnings = eemeter.get_reporting_data(
    meter_data, start=blackout_end_date, max_days=365
)

# compute metered savings for the year of the reporting period we've selected
metered_savings_dataframe, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data,
    temperature_data, with_disaggregated=True
)

# total metered savings
total_metered_savings = metered_savings_dataframe.metered_savings.sum()
Quickstart for CalTRACK Hourly

And here’s now to run the CalTRACK hourly model. Again, this is explained in more detail below. See also CalTRACK Hourly:

import eemeter

meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-hourly")
)

# the dates if an analysis "blackout" period during which a project was performed.
blackout_start_date = sample_metadata["blackout_start_date"]
blackout_end_date = sample_metadata["blackout_end_date"]

# get meter data suitable for fitting a baseline model
baseline_meter_data, warnings = eemeter.get_baseline_data(
    meter_data, end=blackout_start_date, max_days=365
)

# create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data, temperature_data,
    )
)

# build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'three_month_weighted'
)

# assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# assign temperatures to bins
occupied_temperature_bins, unoccupied_temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
    occupancy_lookup=occupancy_lookup,
)

# build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        occupied_temperature_bins,
        unoccupied_temperature_bins,
    )
)

# build a CalTRACK hourly model
baseline_model = eemeter.fit_caltrack_hourly_model(
    segmented_design_matrices,
    occupancy_lookup,
    occupied_temperature_bins,
    unoccupied_temperature_bins,
)

# get a year of reporting period data
reporting_meter_data, warnings = eemeter.get_reporting_data(
    meter_data, start=blackout_end_date, max_days=365
)

# compute metered savings for the year of the reporting period we've selected
metered_savings_dataframe, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data,
    temperature_data, with_disaggregated=True
)

# total metered savings
total_metered_savings = metered_savings_dataframe.metered_savings.sum()

Data Formats

The three essential inputs to eemeter library functions are the following:

  1. Meter Data
  2. Temperature Data from a nearby weather station
  3. Project or intervention dates
Pandas Data Formats

We use pandas data formats in order to take advantage of the powerful data analysis tools provided in that package that users may already be familiar with. The specifics of these formats are discussed in more detail below.

Please refer directly to the excellent pandas documentation for instructions for loading data (e.g., pandas.read_csv). The eemeter does come packaged with loading methods, but these will only work for particular data formats. Here are some useful eemeter methods for loading and manipulating data:

Meter Data

Meter data is stored as a pandas.DataFrame with a pandas.DatetimeIndex. Your data must be in the format demonstrated below to work with the eemeter library.

By convention,

  1. Meter data are stored by period start date.
  2. The length of each period is determined by the start date of the next value (which may be null).
  3. The end date of the last period is given by a single nan-valued period appended at the end of the data for completeness.
  4. The name of the dataframe column is “value”. Units must be tracked separately.
  5. The datetimes in the index must be timezone-aware.

Some examples of the eemeter meter data format:

import numpy as np
import pandas as pd

# one year of daily data
meter_data = pd.DataFrame(
    {"value": [1] * 365 + [np.nan]},
    index=pd.date_range("2018-01-01", "2019-01-01", freq="D", tz="UTC", name="start")
)

# two years of monthly data
meter_data = pd.DataFrame(
    {"value": [1] * 24 + [np.nan]},
    index=pd.date_range("2017-01-01", "2019-01-01", freq="MS", tz="UTC", name="start")
)

# three months of 15-minute interval data
meter_data = pd.DataFrame(
    {"value": [1] * 90 * 24 * 4 + [np.nan]},
    index=pd.date_range("2018-01-01", "2018-04-01", freq="15T", tz="UTC", name="start")
)
Temperature Data

Temperature data is stored as a pandas.Series with a pandas.DatetimeIndex. While temperature data from any source can be used, the eeweather library is designed specificially to provide temperature data from public sources for eemeter users.

The eeweather library helps perform site to weather station matching and can pull temperature data directly from public (US) data sources.

By convention,

  1. Temperature data must be given with an hourly frequency.
  2. The datetimes in the index must be timezone-aware.

An example of the eemeter temperature data format:

import numpy as np
import pandas as pd

# three months of hourly interval data
temperature_data = pd.Series(
    [1] * 24 * 90 + [np.nan],
    index=pd.date_range('2018-01-01', '2018-04-01', freq='H', tz='UTC')
)
Using EEweather

Given a site location specified by lat/long coordinate, eeweather can find an appropriate nearby weather station within the same climate zone and pull temperature data directly from public sources:

# requires user to `$ pip install eeweather sqlalchemy`
from datetime import datetime
import pytz
import eeweather

latitude = 38.1
longitude = -118.3
ranked_stations_closest_within_climate_zone = eeweather.rank_stations(
    latitude,
    longitude,
    match_iecc_climate_zone=True,
    match_iecc_moisture_regime=True,
    match_ba_climate_zone=True,
    match_ca_climate_zone=True,
    max_distance_meters=100000,
)

ranked_stations_closest_anywhere = eeweather.rank_stations(
    latitude,
    longitude,
)

ranked_stations = eeweather.combine_ranked_stations([
    ranked_stations_closest_within_climate_zone,
    ranked_stations_closest_anywhere,
])

start_date = datetime(2018, 1, 1, tzinfo=pytz.UTC)
end_date = datetime(2019, 1, 1, tzinfo=pytz.UTC)
selected_station, warnings = eeweather.select_station(
    ranked_stations,
    coverage_range=(start_date, end_date)
)
selected_station.usaf_id
temp_degC, warnings = selected_station.load_isd_hourly_temp_data(
    start_date, end_date
)
temp_degF = temp_degC * 9 / 5 + 32
Sample Data

If you’d like to continue with this tutorial without loading in your own data, you can use the fake data provided as samples along with this library:

# hourly
meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-hourly")
)

# daily
meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-daily")
)

# other samples
sample_names = eemeter.samples()

Building a Baseline Model

The CalTRACK methods require building a model of the usage during the baseline period and then projecting that forward into the reporting period to calculate avoided energy use. Before we can build the baseline model we need to get isolate 365 days of meter data immediately prior to the end of the baseline period.

This method pulls data for a 365 baseline period by slicing backward from a project date:

import pandas as pd
import eemeter
from dateime import datetime
import pytz

meter_data = pd.DataFrame(
    {"value": [1] * 730 + [np.nan]},
    index=pd.date_range("2017-01-01", "2019-01-01", freq="D", tz="UTC", name="start")
)
baseline_end_date = datetime(2018, 6, 1, tzinfo=pytz.UTC)
baseline_meter_data, warnings = eemeter.get_baseline_data(
    meter_data, end=baseline_end_date, max_days=365
)

With baseline data isolated, we can build a baseline model. There are currently two options for this: hourly and billing/daily.

CalTRACK Daily/Billing Methods

The CalTRACK daily and billing methods specifiy a way of modeling the weather-dependent energy signature of a building. It selects a model which fits the data as well as possible from a selection of candidate models. The parameters of the model are heating and cooling balance points (i.e., the temperatures at which heating/cooling related energy use tend to kick in), and the heating and cooling beta parameters, which define the slope of the energy response to incremental differences between outdoor temperature and the balance point. We’ll do a grid search over possible heating and cooling balance points and fit models to the heating and cooling degree days defined by the outdoor temperatures and each of those balance points. To do this, we precompute the heating and cooling degree days using the methods below before we feed them into the modeling routines.

To run the CalTRACK Daily/Billing methods:

# create a design matrix suitable for use with daily data
baseline_design_matrix = eemeter.create_caltrack_daily_design_matrix(
    baseline_meter_data, temperature_data,
)

# create a design matrix suitable for use with billing data
baseline_design_matrix = eemeter.create_caltrack_billing_design_matrix(
    baseline_meter_data, temperature_data,
)

# build a CalTRACK model
baseline_model = eemeter.fit_caltrack_usage_per_day_model(
    baseline_design_matrix,
)

These methods are shortcuts. Behind the scenes, they combine meter data and temperature data into a single DataFrame using eemeter.compute_usage_per_day_feature to transform the meter data into usage per day and eemeter.compute_temperature_features to create a a search grid of heating and cooling degree day values. The shortcut methods use this case, we’ll use the wide balance point ranges recommended by CalTRACK. The shortcut method can combines the two using eemeter.merge_features.

If using billing data, note that the values represented in the design matrix created by calling eemeter.compute_usage_per_day_feature are returned as average usage per day, as specified by the CalTRACK methods, not as totals per period, as they are represented in the inputs. The heating/cooling degree days returned by compute_temperature_features are also average heating/cooling degree days per day, and not total heating/cooling degree days per period. This averaging behavior can be modified with the use_mean_daily_values parameter, which is set to True by default.

CalTRACK Hourly Methods

The CalTRACK hourly methods require a multi-stage dataset creation process which is a bit more involved than the daily/billing dataset creation process above. There are two primary reasons for this extra complexity. First, unlike the daily/billing methods, the hourly methods build separate models for each calendar month, which adds a few extra steps. Second, also unlike the billing and daily methods, there are two features of the dataset creation which must themselves be fitted to a preliminary dataset occupancy features and temperature bin features.

Preliminary design matrix

The preliminary design matrix has some simple time and temperature features. These features do not vary by segment and are precursors to other features (see below for a better explanation of segmentation). This step looks a lot like the daily/billing dataset creation. These features are used subsequently to fit the occupancy and temperature bin features.

The preliminary design matrix has only two fixed heating (50 degF) and cooling (65 degF) degree day columns - these are used to fit the occupancy model. It also has an hour of week column, which is a categorical variable indicating the hour of the week using an integer from 0 to 167 (i.e., 7 days * 24 hours/day). 0 is Monday midnight to 1am.

Segmentation

CalTRACK hourly requires creating independent models for each month of a dataset. The eemeter package calls this “segmentation”. Segmentation breaks a dataset into $n$ named and weighted subsets.

Before we can move on to the next steps of creating the CalTRACK hourly dataset, we need to create a monthly segmentation for the hourly data. We will use this to create 12 independent hourly models - one for each month of the calendar year. The eemeter function for creating these weights is called eemeter.segment_time_series and it takes a pandas.DatetimeIndex as input.

This segmentation matrix contains 1 column for each segment (12 in all), each of which contains the segmentation weights for that column. The segmentation scheme we use here is to have one segment for each month which contains a single fully weighted calendar month and two half-weighted neighboring calendar months. The eemeter code name for this segmentation scheme is called ‘three_month_weighted’ (There’s also ‘all’, ‘one_month’, and ‘three_month’, each of which behaves a bit differently).

We are creating this segmentation over the time index of the baseline period that is represented in the preliminary hourly design matrix.

Occupancy

Occupancy is estimated by building a simple model from the preliminary design matrix hdd_50 and cdd_65 columns. This is done for each segment independently, so results are returned as a dataframe with one segment of results per column. The segmentation argument indicates that the analysis should be done once per segment. Occupancy is determined by hour of week category. A value of 1 for a particular hour indicates an “occupied” mode, and a value of 0 indicates “unoccupied” mode. These modes are determined by the tendency of the hdd_50/cdd_65 model to over- or under-predict usage for that hour, given a particular threshold between 0 and 1 (default 0.65) (if the percent of underpredictions (by count) is lower than that threshold, then the mode is “unoccupied”, otherwise the mode is “occupied”).

The occupancy lookup is organized by hour of week (rows) and model segment (columns).

Fitting segmented temperature bins

Temperature bins are fit for each segment such that each bin has sufficient number of temperature readings (20 per bin, by default). Bins are defined by starting with a proposed set of bins (see the default_bins argument) and systematically dropping bin endpoints if they do not meet sufficiency requirements. Bins themselves are not dropped but are effectively combined with neighboring bins. Except for the fact that zero-weighted times are dropped, segment weights are not considered when fitting temperature bins.

Because bin fitting and validation is done independently for each segment, results are returned as a dataframe with one segment of results per column. The contents of the dataframe are boolean indicators of whether the bin endpoint should be used for temperatures in that segment. Some bin endpoints are dropped because of insufficient reading counts. The bin endpoints that are dropped for each segment are given a value of False. You’ll notice in this dataset that the the winter months tend to have combined high temperature bins and the summer months tend to have combined low temperature bins.

  • eemeter.fit_temperature_bins: Fit temperature bins to data, dropping bin endpoints for bins that do not meet the minimum temperature count such that remaining bins meet the minimum count.

With these features in hand, now we can combine them into a segmented dataset using the helper function eemeter.iterate_segmented_dataset and a prefabricated feature processor eemeter.caltrack_hourly_fit_feature_processor which is provided to assist creating the segmented dataset given a preliminary design matrix of the form created above. The feature processor transforms the each segment of the dataset using the occupancy lookup and temperature bins created above. We are creating a python dict of pandas.Dataframes - one for each time series segment encountered in the baseline data. The keys of the dict are segment names. The values are DataFrame objects containing the exact data needed to fit the a CalTRACK hourly model.

Putting it all together

To run the CalTRACK Hourly methods:

# create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data, temperature_data,
    )
)

# build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'three_month_weighted'
)

# assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# assign temperatures to bins
occupied_temperature_bins, unoccupied_temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
    occupancy_lookup=occupancy_lookup,
)

# build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        occupied_temperature_bins,
        unoccupied_temperature_bins,
    )
)

# build a CalTRACK hourly model
baseline_model = eemeter.fit_caltrack_hourly_model(
    segmented_design_matrices,
    occupancy_lookup,
    occupied_temperature_bins,
    unoccupied_temperature_bins,
)

Computing CalTRACK metered savings

Suppose we wanted to calculated metered savings for the year following a project intervention. This could be accomplished by first slicing the original meter data down to the subset in the first year following an intervention.

The eemeter.metered_savings method performs the logic of estimating counterfactual baseline reporting period usage. For this, it requires the fitted baseline model, the reporting period meter data (for its index - so that it can be properly joined later), and corresponding temperature data. Note that this method can return results disaggregated into base load, cooling load, or heating load or as the aggregated usage. We do both here for demonstration purposes.:

reporting_meter_data, warnings = eemeter.get_reporting_data(
    meter_data, start=blackout_end_date, max_days=365
)

metered_savings, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data, temperature_data
)

# with disaggregated usage predictions, billing/daily only
metered_savings, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data, temperature_data, with_disaggregated=True
)

This method also returns error bands which can be used for calculating Fractional Savings Uncertainty.

Computing (non-CalTRACK) typical year savings

If we want to compute annual weather normalized modeled savings, we’ll need a model of reporting period usage in addition to the model of baseline period usage created in the tutorial above.:

import pandas as pd
import eeweather

# temperature data
normal_year_temperatures = (
    eeweather.ISDStation('722880').load_tmy3_hourly_temp_data()
)
# dates over which to predict
prediction_index = pd.date_range('2015-01-01', periods=365, freq='D', tz='UTC')
annualized_savings = eemeter.modeled_savings(
    baseline_model, reporting_model,
    prediction_index, normal_year_temperatures, with_disaggregated=True
)

Visualization

Fitted Billing/Daily models can be inspected by plotting an energy signature chart:

ax = eemeter.plot_energy_signature(meter_data, temperature_data)
baseline_model.plot(
    ax=ax, candidate_alpha=0.02, with_candidates=True, temp_range=(-5, 88)
)

Cautions

At time of writing (Sept 2019), the OpenEEmeter, as implemented in the eemeter package, contains the most complete open source implementation of the CalTRACK methods, which specify a way of calculating avoided energy use at a single energy meter at a single site. However, using the OpenEEmeter to calculate avoided energy use does not in itself guarantee compliance with the CalTRACK method specification, nor is using the OpenEEmeter a requirement of the CalTRACK methods. The eemeter package is a toolkit that may help with implementing a CalTRACK compliant analysis, as it provides a particular implementation of the CalTRACK methods which consists of a set of functions, parameters, and classes which can be configured to run the CalTRACK methods and variants. Please keep in mind while using the package that the eemeter assumes certain data cleaning tasks that are specified in the CalTRACK methods have occurred prior to usage with the eemeter. The package will create warnings to expose issues of this nature where possible.

The eemeter package is built for flexibility and modularity. While this is generally helpful and makes it possible to do more with the package, one potential consequence of this for users is that without being careful to follow the both the eemeter documentation and the guidance provided in the CalTRACK methods, it is very possible to use the eemeter in a way that does not comply with the CalTRACK methods. For example, while the CalTRACK methods set specific hard limits for the purpose of standardization and consistency, the eemeter can be configured to edit or entirely ignore those limits. The main reason for this flexibility is that the emeter package is used not only to comply with the CalTRACK methods, but also to develop, test, and propose potential changes to those methods.

Rather than providing a single method that directly calculates avoided energy use from the required inputs, the eemeter library provides a series of modular functions that can be strung together in a variety of ways. The tutorial below describes common usage and sequencing of these functions, especially when it might not otherwise be apparent from the API Docs.

Some new users have assumed that the eemeter package constitutes an entire application suitable for running metering analytics at scale. This is not necessarily the case. It is designed instead to be embedded within other applications or to be used in one-off analyses. The eemeter is a toolbox that leaves to the user decisions about when to use or how to embed the provided tools within other applications. This limitation is an important consequence of the decision to make the methods and implementation as open and accessible as possible.

As you dive in, remember that this is a work in progress and that we welcome feedback and contributions. To contribute, please open an issue or a pull request on github.

CalTRACK Compliance

Checklist for caltrack compliance:

Section 2.2: Data Constraints

Section 2.2.1: Missing Values/Data Sufficiency
  • 2.2.1.1: eemeter.get_baseline_data must set max_days=365.
  • 2.2.1.2: eemeter.caltrack_sufficiency_criteria must set min_fraction_daily_coverage=0.9
  • 2.2.1.3: (Data Preparation) Missing values in input data have been represented as float('nan'), np.nan, or anything recognized as null by the method pandas.isnull.
  • 2.2.1.4: (Data Preparation) Values of 0 in electricity data have been converted to np.nan.
Section 2.2.2: Daily Data Sufficiency
Section 2.2.3: Billing Data Sufficiency
  • 2.2.3.1: (Data Preparation) Estimated reads in input data have been combined with subsequent reads up to a 70 day limit. Estimated reads count as missing values when evaluating the sufficiency criteria defined in 2.2.1.2.
  • 2.2.3.2: eemeter.create_caltrack_billing_design_matrix must set percent_hourly_coverage_per_billing_period=0.9 for eemeter.compute_temperature_features and :any:`eemeter.caltrack_sufficiency_criteria must set min_fraction_daily_coverage=0.9.
  • 2.2.3.3: (Data Preparation) Input meter data that represents billing periods less than 25 days long has been converted to np.nan.
  • 2.2.3.4: (Data Preparation) Input meter data that represents billing periods greater than 35 days long for pseudo-monthly billing period calculations and 70 days long for bi-monthly billing period calculations has been converted to np.nan.
Section 2.2.X: Other Data Sufficiency Requirements
  • 2.2.4: eemeter.caltrack_sufficiency_criteria set requested_start_date and requested_end_date to receive critical warnings related to data outside of the requested period of analysis.
  • 2.2.5: (Data Preparation) Projects have been removed if the status of net metering has changed during the baseline or reporting periods.
  • 2.2.6: (Data Preparation) Projects have been removed if EV charging has been installed during the baseline or reporting periods.

Section 2.3: Data Quality

Section 2.3.1: Impossible Dates
  • 2.3.1.1: (Data Preparation) For billing analysis, input meter data containing invalid dates for a valid month have been converted to the first date of that month.
  • 2.3.1.2: (Data Preparation) Input meter data containing invalid months/years for have been removed and a warning has been generated.
Section 2.3.2: Duplicate Records
  • 2.3.2.1: (Data Preparation) Meter usage and temperature data has used matching time zone information to ensure that the upsampled values represent the same periods of time.
  • 2.3.2.2: (Data Preparation) If duplicate rows are found for meter data, then the project must be flagged as it may have sub-metering/multiple meters.
Section 2.3.X: Other Data Quality Requirements
  • 2.3.3: :any: eemeter.merge_temperature_data meter_data and temperature_data must be timezone-aware and have matching timezones.
  • 2.3.4: If NOAA weather data was used (which is roughly hourly), it has been normalized to hourly using eeweather.ISDStation.fetch_isd_hourly_temp_data.
  • 2.3.5: Warnings are generated in eemeter.caltrack_sufficiency_criteria if negative meter values are discovered as they indicate the possible presence of unreported net metering.
  • 2.3.6: (Data Preparation) Must generate warning for values that are more than three interquartile ranges larger than the median usage.
  • 2.3.7: (Audit) Resulting dataset of meter runs has been compared with expected counts of sites, meters, and projects.
  • 2.3.8: (Data Preparation) Meter data has been downsampled according to the desired frequency for analysis using eemeter.as_freq before merging of temperature data or modeling.

Section 2.4: Matching Sites to Weather Stations

Section 3.2: Balance Points

Section 3.3: Design Matrix (for Daily and Billing Methods)

Section 3.4: Fit Candidate Models

Section 3.5: Computing Derived Quantities

API Docs

CalTRACK

CalTRACK design matrix creation

These functions are designed as shortcuts to common CalTRACK design matrix inputs.

eemeter.create_caltrack_hourly_preliminary_design_matrix(meter_data, temperature_data)[source]

A helper function which calls basic feature creation methods to create an input suitable for use in the first step of creating a CalTRACK hourly model.

Parameters:
  • meter_data (pandas.DataFrame) – Hourly meter data in eemeter format.
  • temperature_data (pandas.Series) – Hourly temperature data in eemeter format.
Returns:

design_matrix – A design matrix with meter_value, hour_of_week, hdd_50, and cdd_65 features.

Return type:

pandas.DataFrame

eemeter.create_caltrack_hourly_segmented_design_matrices(preliminary_design_matrix, segmentation, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]

A helper function which calls basic feature creation methods to create a design matrix suitable for use with segmented CalTRACK hourly models.

Parameters:
Returns:

design_matrix – A dict of design matrixes created using the eemeter.caltrack_hourly_fit_feature_processor.

Return type:

dict of pandas.DataFrame

eemeter.create_caltrack_daily_design_matrix(meter_data, temperature_data)[source]

A helper function which calls basic feature creation methods to create a design matrix suitable for use with CalTRACK daily methods.

Parameters:
  • meter_data (pandas.DataFrame) – Hourly meter data in eemeter format.
  • temperature_data (pandas.Series) – Hourly temperature data in eemeter format.
Returns:

design_matrix – A design matrics with mean usage_per_day, hdd_30-hdd_90, and cdd_30-cdd_90 features.

Return type:

pandas.DataFrame

eemeter.create_caltrack_billing_design_matrix(meter_data, temperature_data)[source]

A helper function which calls basic feature creation methods to create a design matrix suitable for use with CalTRACK Billing methods.

Parameters:
  • meter_data (pandas.DataFrame) – Hourly meter data in eemeter format.
  • temperature_data (pandas.Series) – Hourly temperature data in eemeter format.
Returns:

design_matrix – A design matrics with mean usage_per_day, hdd_30-hdd_90, and cdd_30-cdd_90 features.

Return type:

pandas.DataFrame

CalTRACK Hourly

These classes and functions are designed to assist with running the CalTRACK Hourly methods. See also Quickstart for CalTRACK Hourly.

class eemeter.CalTRACKHourlyModel(segment_models, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]

An object which holds CalTRACK Hourly model data and metadata, and which can be used for prediction.

segment_models

Dictionary of models for each segment, keys are segment names.

Type:dict of eemeter.CalTRACKSegmentModel
occupancy_lookup

A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1.

Type:pandas.DataFrame
occupied_temperature_bins

A dataframe of bin endpoint flags for each segment. Segment names are columns.

Type:pandas.DataFrame
unoccupied_temperature_bins

Ditto for the unoccupied mode.

Type:pandas.DataFrame
classmethod from_json(data)[source]

Loads a JSON-serializable representation into the model state.

The input of this function is a dict which can be the result of json.loads.

json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

class eemeter.CalTRACKHourlyModelResults(status, method_name, model=None, warnings=[], metadata=None, settings=None)[source]

Contains information about the chosen model.

status

A string indicating the status of this result. Possible statuses:

  • 'NO DATA': No baseline data was available.
  • 'NO MODEL': A complete model could not be constructed.
  • 'SUCCESS': A model was constructed.
Type:str
method_name

The name of the method used to fit the baseline model.

Type:str
model

The selected model, if any.

Type:eemeter.CalTRACKHourlyModel or None
warnings

A list of any warnings reported during the model selection and fitting process.

Type:list of eemeter.EEMeterWarning
metadata

An arbitrary dictionary of metadata to be associated with this result. This can be used, for example, to tag the results with attributes like an ID:

{
    'id': 'METER_12345678',
}
Type:dict
settings

A dictionary of settings used by the method.

Type:dict
totals_metrics

A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as period totals.

Type:ModelMetrics
avgs_metrics

A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as daily averages.

Type:ModelMetrics
classmethod from_json(data)[source]

Loads a JSON-serializable representation into the model state.

The input of this function is a dict which can be the result of json.loads.

json(with_candidates=False)[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

predict(prediction_index, temperature_data, **kwargs)[source]

Predict over a particular index using temperature data.

Parameters:
  • prediction_index (pandas.DatetimeIndex) – Time period over which to predict.
  • temperature_data (pandas.DataFrame) – Hourly temperature data to use for prediction. Time period should match the prediction_index argument.
  • **kwargs – Extra keyword arguments to send to self.model.predict
Returns:

prediction – The predicted usage values.

Return type:

pandas.DataFrame

eemeter.caltrack_hourly_fit_feature_processor(segment_name, segmented_data, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]

A function that takes in temperature data and returns a dataframe of features suitable for use with eemeter.fit_caltrack_hourly_model_segment. Designed for use with eemeter.iterate_segmented_dataset.

Parameters:
  • segment_name (str) – The name of the segment.
  • segmented_data (pandas.DataFrame) – Hourly temperature data for the segment.
  • occupancy_lookup (pandas.DataFrame) – A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1.
  • occupied_temperature_bins (pandas.DataFrame) – A dataframe of bin endpoint flags for each segment. Segment names are columns.
  • unoccupied_temperature_bins (pandas.DataFrame) – Ditto for the unoccupied mode.
Returns:

features – A dataframe of features with the following columns:

  • ’meter_value’: the observed meter value
  • ’hour_of_week’: 0-167
  • ’bin_<0-6>_occupied’: temp bin feature, or 0 if unoccupied
  • ’bin_<0-6>_unoccupied’: temp bin feature or 0 in occupied
  • ’weight’: 0.0 or 0.5 or 1.0

Return type:

pandas.DataFrame

eemeter.caltrack_hourly_prediction_feature_processor(segment_name, segmented_data, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]

A function that takes in temperature data and returns a dataframe of features suitable for use inside eemeter.CalTRACKHourlyModel. Designed for use with eemeter.iterate_segmented_dataset.

Parameters:
  • segment_name (str) – The name of the segment.
  • segmented_data (pandas.DataFrame) – Hourly temperature data for the segment.
  • occupancy_lookup (pandas.DataFrame) – A dataframe with occupancy flags for each hour of the week and each segment. Segment names are columns, occupancy flags are 0 or 1.
  • occupied_temperature_bins (pandas.DataFrame) – A dataframe of bin endpoint flags for each segment. Segment names are columns.
  • unoccupied_temperature_bins (pandas.DataFrame) – Ditto for the unoccupied mode.
Returns:

features – A dataframe of features with the following columns:

  • ’hour_of_week’: 0-167
  • ’bin_<0-6>_occupied’: temp bin feature, or 0 if unoccupied
  • ’bin_<0-6>_unoccupied’: temp bin feature or 0 in occupied
  • ’weight’: 1

Return type:

pandas.DataFrame

eemeter.fit_caltrack_hourly_model_segment(segment_name, segment_data)[source]

Fit a model for a single segment.

Parameters:
Returns:

segment_model – A model that represents the fitted model.

Return type:

CalTRACKSegmentModel

eemeter.fit_caltrack_hourly_model(segmented_design_matrices, occupancy_lookup, occupied_temperature_bins, unoccupied_temperature_bins)[source]

Fit a CalTRACK hourly model

Parameters:
Returns:

model – Has a model.predict method which take input data and makes a prediction using this model.

Return type:

CalTRACKHourlyModelResults

CalTRACK Daily and Billing (Usage per Day)

These classes and functions are designed to assist with running the CalTRACK Daily and Billing methods. See also Quickstart for CalTRACK Billing/Daily.

class eemeter.CalTRACKUsagePerDayCandidateModel(model_type, formula, status, model_params=None, model=None, result=None, r_squared_adj=None, warnings=None)[source]

Contains information about a candidate model.

model_type

The type of model, e..g., 'hdd_only'.

Type:str
formula

The R-style formula for the design matrix of this model, e.g., 'meter_value ~ hdd_65'.

Type:str
status

A string indicating the status of this model. Possible statuses:

  • 'NOT ATTEMPTED': Candidate model not fitted due to an issue encountered in data before attempt.
  • 'ERROR': A fatal error occurred during model fit process.
  • 'DISQUALIFIED': The candidate model fit was disqualified from the model selection process because of a decision made after candidate model fit completed, e.g., a bad fit, or a parameter out of acceptable range.
  • 'QUALIFIED': The candidate model fit is acceptable and can be considered during model selection.
Type:str
model_params

A flat dictionary of model parameters which must be serializable using the json.dumps method.

Type:dict, default None
model

The raw model (if any) used in fitting. Not serialized.

Type:object
result

The raw modeling result (if any) returned by the model. Not serialized.

Type:object
r_squared_adj

The adjusted r-squared of the candidate model.

Type:float
warnings

A list of any warnings reported during creation of the candidate model.

Type:list of eemeter.EEMeterWarning
classmethod from_json(data)[source]

Loads a JSON-serializable representation into the model state.

The input of this function is a dict which can be the result of json.loads.

json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

plot(best=False, ax=None, title=None, figsize=None, temp_range=None, alpha=None, **kwargs)[source]

Plot

predict(prediction_index, temperature_data, with_disaggregated=False, with_design_matrix=False, **kwargs)[source]

Predict

class eemeter.CalTRACKUsagePerDayModelResults(status, method_name, interval=None, model=None, r_squared_adj=None, candidates=None, warnings=None, metadata=None, settings=None)[source]

Contains information about the chosen model.

status

A string indicating the status of this result. Possible statuses:

  • 'NO DATA': No baseline data was available.
  • 'NO MODEL': No candidate models qualified.
  • 'SUCCESS': A qualified candidate model was chosen.
Type:str
method_name

The name of the method used to fit the baseline model.

Type:str
model

The selected candidate model, if any.

Type:eemeter.CalTRACKUsagePerDayCandidateModel or None
r_squared_adj

The adjusted r-squared of the selected model.

Type:float
candidates

A list of any model candidates encountered during the model selection and fitting process.

Type:list of eemeter.CalTRACKUsagePerDayCandidateModel
warnings

A list of any warnings reported during the model selection and fitting process.

Type:list of eemeter.EEMeterWarning
metadata

An arbitrary dictionary of metadata to be associated with this result. This can be used, for example, to tag the results with attributes like an ID:

{
    'id': 'METER_12345678',
}
Type:dict
settings

A dictionary of settings used by the method.

Type:dict
totals_metrics

A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as period totals.

Type:ModelMetrics
avgs_metrics

A ModelMetrics object, if one is calculated and associated with this model. (This initializes to None.) The ModelMetrics object contains model fit information and descriptive statistics about the underlying data, with that data expressed as daily averages.

Type:ModelMetrics
classmethod from_json(data)[source]

Loads a JSON-serializable representation into the model state.

The input of this function is a dict which can be the result of json.loads.

json(with_candidates=False)[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

plot(ax=None, title=None, figsize=None, with_candidates=False, candidate_alpha=None, temp_range=None)[source]

Plot a model fit.

Parameters:
  • ax (matplotlib.axes.Axes, optional) – Existing axes to plot on.
  • title (str, optional) – Chart title.
  • figsize (tuple, optional) – (width, height) of chart.
  • with_candidates (bool) – If True, also plot candidate models.
  • candidate_alpha (float between 0 and 1) – Transparency at which to plot candidate models. 0 fully transparent, 1 fully opaque.
Returns:

ax – Matplotlib axes.

Return type:

matplotlib.axes.Axes

class eemeter.DataSufficiency(status, criteria_name, warnings=None, data=None, settings=None)[source]

Contains the result of a data sufficiency check.

status

A string indicating the status of this result. Possible statuses:

  • 'NO DATA': No baseline data was available.
  • 'FAIL': Data did not meet criteria.
  • 'PASS': Data met criteria.
Type:str
criteria_name

The name of the criteria method used to check for baseline data sufficiency.

Type:str
warnings

A list of any warnings reported during the check for baseline data sufficiency.

Type:list of eemeter.EEMeterWarning
data

A dictionary of data related to determining whether a warning should be generated.

Type:dict
settings

A dictionary of settings (keyword arguments) used.

Type:dict
json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

class eemeter.ModelPrediction(result, design_matrix, warnings)
design_matrix

Alias for field number 1

result

Alias for field number 0

warnings

Alias for field number 2

eemeter.fit_caltrack_usage_per_day_model(data, fit_cdd=True, use_billing_presets=False, minimum_non_zero_cdd=10, minimum_non_zero_hdd=10, minimum_total_cdd=20, minimum_total_hdd=20, beta_cdd_maximum_p_value=1, beta_hdd_maximum_p_value=1, weights_col=None, fit_intercept_only=True, fit_cdd_only=True, fit_hdd_only=True, fit_cdd_hdd=True)[source]

CalTRACK daily and billing methods using a usage-per-day modeling strategy.

Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and 1 to n columns each of the form hdd_<heating_balance_point> and cdd_<cooling_balance_point>. DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods. Should have a pandas.DatetimeIndex.
  • fit_cdd (bool, optional) – If True, fit CDD models unless overridden by fit_cdd_only or fit_cdd_hdd flags. Should be set to False for gas meter data.
  • use_billing_presets (bool, optional) – Use presets appropriate for billing models. Otherwise defaults are appropriate for daily models.
  • minimum_non_zero_cdd (int, optional) – Minimum allowable number of non-zero cooling degree day values.
  • minimum_non_zero_hdd (int, optional) – Minimum allowable number of non-zero heating degree day values.
  • minimum_total_cdd (float, optional) – Minimum allowable total sum of cooling degree day values.
  • minimum_total_hdd (float, optional) – Minimum allowable total sum of heating degree day values.
  • beta_cdd_maximum_p_value (float, optional) – The maximum allowable p-value of the beta cdd parameter. The default value is the most permissive possible (i.e., 1). This is here for backwards compatibility with CalTRACK 1.0 methods.
  • beta_hdd_maximum_p_value (float, optional) – The maximum allowable p-value of the beta hdd parameter. The default value is the most permissive possible (i.e., 1). This is here for backwards compatibility with CalTRACK 1.0 methods.
  • weights_col (str or None, optional) – The name of the column (if any) in data to use as weights. Weight must be the number of days of data in the period.
  • fit_intercept_only (bool, optional) – If True, fit and consider intercept_only model candidates.
  • fit_cdd_only (bool, optional) – If True, fit and consider cdd_only model candidates. Ignored if fit_cdd=False.
  • fit_hdd_only (bool, optional) – If True, fit and consider hdd_only model candidates.
  • fit_cdd_hdd (bool, optional) – If True, fit and consider cdd_hdd model candidates. Ignored if fit_cdd=False.
Returns:

model_results – Results of running CalTRACK daily method. See eemeter.CalTRACKUsagePerDayModelResults for more details.

Return type:

eemeter.CalTRACKUsagePerDayModelResults

eemeter.caltrack_sufficiency_criteria(data_quality, requested_start, requested_end, num_days=365, min_fraction_daily_coverage=0.9, min_fraction_hourly_temperature_coverage_per_period=0.9)[source]

CalTRACK daily data sufficiency criteria.

Note

For CalTRACK compliance, min_fraction_daily_coverage must be set at 0.9 (section 2.2.1.2), and requested_start and requested_end must not be None (section 2.2.4).

Parameters:
  • data_quality (pandas.DataFrame) – A DataFrame containing at least the column meter_value and the two columns temperature_null, containing a count of null hourly temperature values for each meter value, and temperature_not_null, containing a count of not-null hourly temperature values for each meter value. Should have a pandas.DatetimeIndex.
  • requested_start (datetime.datetime, timezone aware (or None)) – The desired start of the period, if any, especially if this is different from the start of the data. If given, warnings are reported on the basis of this start date instead of data start date. Must be explicitly set to None in order to use data start date.
  • requested_end (datetime.datetime, timezone aware (or None)) – The desired end of the period, if any, especially if this is different from the end of the data. If given, warnings are reported on the basis of this end date instead of data end date. Must be explicitly set to None in order to use data end date.
  • num_days (int, optional) – Exact number of days allowed in data, including extent given by requested_start or requested_end, if given.
  • min_fraction_daily_coverage (:any:, optional) – Minimum fraction of days of data in total data extent for which data must be available.
  • min_fraction_hourly_temperature_coverage_per_period=0.9, – Minimum fraction of hours of temperature data coverage in a particular period. Anything below this causes the whole period to be considered considered missing.
Returns:

data_sufficiency – The an object containing sufficiency status and warnings for this data.

Return type:

eemeter.DataSufficiency

eemeter.caltrack_usage_per_day_predict(model_type, model_params, prediction_index, temperature_data, degree_day_method='daily', with_disaggregated=False, with_design_matrix=False)[source]

CalTRACK predict method.

Given a model type, parameters, hourly temperatures, a pandas.DatetimeIndex index over which to predict meter usage, return model predictions as totals for the period (so billing period totals, daily totals, etc.). Optionally include the computed design matrix or disaggregated usage in the output dataframe.

Parameters:
  • model_type (str) – Model type (e.g., 'cdd_hdd').
  • model_params (dict) – Parameters as stored in eemeter.CalTRACKUsagePerDayCandidateModel.model_params.
  • temperature_data (pandas.DataFrame) – Hourly temperature data to use for prediction. Time period should match the prediction_index argument.
  • prediction_index (pandas.DatetimeIndex) – Time period over which to predict.
  • with_disaggregated (bool, optional) – If True, return results as a pandas.DataFrame with columns 'base_load', 'heating_load', and 'cooling_load'.
  • with_design_matrix (bool, optional) – If True, return results as a pandas.DataFrame with columns 'n_days', 'n_days_dropped', n_days_kept, and temperature_mean.
Returns:

  • prediction (pandas.DataFrame) – Columns are as follows:
    • predicted_usage: Predicted usage values computed to match prediction_index.
    • base_load: modeled base load (only for with_disaggregated=True).
    • cooling_load: modeled cooling load (only for with_disaggregated=True).
    • heating_load: modeled heating load (only for with_disaggregated=True).
    • n_days: number of days in period (only for with_design_matrix=True).
    • n_days_dropped: number of days dropped because of insufficient data (only for with_design_matrix=True).
    • n_days_kept: number of days kept because of sufficient data (only for with_design_matrix=True).
    • temperature_mean: mean temperature during given period. (only for with_design_matrix=True).
  • predict_warnings (:any: list of EEMeterWarning if any.)

eemeter.plot_caltrack_candidate(candidate, best=False, ax=None, title=None, figsize=None, temp_range=None, alpha=None, **kwargs)[source]

Plot a CalTRACK candidate model.

Parameters:
  • candidate (eemeter.CalTRACKUsagePerDayCandidateModel) – A candidate model with a predict function.
  • best (bool, optional) – Whether this is the best candidate or not.
  • ax (matplotlib.axes.Axes, optional) – Existing axes to plot on.
  • title (str, optional) – Chart title.
  • figsize (tuple, optional) – (width, height) of chart.
  • temp_range (tuple, optional) – (min, max) temperatures to plot model.
  • alpha (float between 0 and 1, optional) – Transparency, 0 fully transparent, 1 fully opaque.
  • **kwargs – Keyword arguments for matplotlib.axes.Axes.plot
Returns:

ax – Matplotlib axes.

Return type:

matplotlib.axes.Axes

eemeter.get_too_few_non_zero_degree_day_warning(model_type, balance_point, degree_day_type, degree_days, minimum_non_zero)[source]

Return an empty list or a single warning wrapped in a list regarding non-zero degree days for a set of degree days.

Parameters:
  • model_type (str) – Model type (e.g., 'cdd_hdd').
  • balance_point (float) – The balance point in question.
  • degree_day_type (str) – The type of degree days ('cdd' or 'hdd').
  • degree_days (pandas.Series) – A series of degree day values.
  • minimum_non_zero (int) – Minimum allowable number of non-zero degree day values.
Returns:

warnings – Empty list or list of single warning.

Return type:

list of eemeter.EEMeterWarning

eemeter.get_total_degree_day_too_low_warning(model_type, balance_point, degree_day_type, avg_degree_days, period_days, minimum_total)[source]

Return an empty list or a single warning wrapped in a list regarding the total summed degree day values.

Parameters:
  • model_type (str) – Model type (e.g., 'cdd_hdd').
  • balance_point (float) – The balance point in question.
  • degree_day_type (str) – The type of degree days ('cdd' or 'hdd').
  • avg_degree_days (pandas.Series) – A series of degree day values.
  • period_days (pandas.Series) – A series of containing day counts.
  • minimum_total (float) – Minimum allowable total sum of degree day values.
Returns:

warnings – Empty list or list of single warning.

Return type:

list of eemeter.EEMeterWarning

eemeter.get_parameter_negative_warning(model_type, model_params, parameter)[source]

Return an empty list or a single warning wrapped in a list indicating whether model parameter is negative.

Parameters:
Returns:

warnings – Empty list or list of single warning.

Return type:

list of eemeter.EEMeterWarning

eemeter.get_parameter_p_value_too_high_warning(model_type, model_params, parameter, p_value, maximum_p_value)[source]

Return an empty list or a single warning wrapped in a list indicating whether model parameter p-value is too high.

Parameters:
Returns:

warnings – Empty list or list of single warning.

Return type:

list of eemeter.EEMeterWarning

eemeter.get_single_cdd_only_candidate_model(data, minimum_non_zero_cdd, minimum_total_cdd, beta_cdd_maximum_p_value, weights_col, balance_point)[source]

Return a single candidate cdd-only model for a particular balance point.

Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and cdd_<balance_point> DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods.
  • minimum_non_zero_cdd (int) – Minimum allowable number of non-zero cooling degree day values.
  • minimum_total_cdd (float) – Minimum allowable total sum of cooling degree day values.
  • beta_cdd_maximum_p_value (float) – The maximum allowable p-value of the beta cdd parameter.
  • weights_col (str or None) – The name of the column (if any) in data to use as weights.
  • balance_point (float) – The cooling balance point for this model.
Returns:

candidate_model – A single cdd-only candidate model, with any associated warnings.

Return type:

CalTRACKUsagePerDayCandidateModel

eemeter.get_single_hdd_only_candidate_model(data, minimum_non_zero_hdd, minimum_total_hdd, beta_hdd_maximum_p_value, weights_col, balance_point)[source]

Return a single candidate hdd-only model for a particular balance point.

Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and hdd_<balance_point> DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods.
  • minimum_non_zero_hdd (int) – Minimum allowable number of non-zero heating degree day values.
  • minimum_total_hdd (float) – Minimum allowable total sum of heating degree day values.
  • beta_hdd_maximum_p_value (float) – The maximum allowable p-value of the beta hdd parameter.
  • weights_col (str or None) – The name of the column (if any) in data to use as weights.
  • balance_point (float) – The heating balance point for this model.
Returns:

candidate_model – A single hdd-only candidate model, with any associated warnings.

Return type:

CalTRACKUsagePerDayCandidateModel

eemeter.get_single_cdd_hdd_candidate_model(data, minimum_non_zero_cdd, minimum_non_zero_hdd, minimum_total_cdd, minimum_total_hdd, beta_cdd_maximum_p_value, beta_hdd_maximum_p_value, weights_col, cooling_balance_point, heating_balance_point)[source]

Return and fit a single candidate cdd_hdd model for a particular selection of cooling balance point and heating balance point

Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and hdd_<heating_balance_point> and cdd_<cooling_balance_point> DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods.
  • minimum_non_zero_cdd (int) – Minimum allowable number of non-zero cooling degree day values.
  • minimum_non_zero_hdd (int) – Minimum allowable number of non-zero heating degree day values.
  • minimum_total_cdd (float) – Minimum allowable total sum of cooling degree day values.
  • minimum_total_hdd (float) – Minimum allowable total sum of heating degree day values.
  • beta_cdd_maximum_p_value (float) – The maximum allowable p-value of the beta cdd parameter.
  • beta_hdd_maximum_p_value (float) – The maximum allowable p-value of the beta hdd parameter.
  • weights_col (str or None) – The name of the column (if any) in data to use as weights.
  • cooling_balance_point (float) – The cooling balance point for this model.
  • heating_balance_point (float) – The heating balance point for this model.
Returns:

candidate_model – A single cdd-hdd candidate model, with any associated warnings.

Return type:

CalTRACKUsagePerDayCandidateModel

eemeter.get_intercept_only_candidate_models(data, weights_col)[source]

Return a list of a single candidate intercept-only model.

Parameters:
Returns:

candidate_models – List containing a single intercept-only candidate model.

Return type:

list of CalTRACKUsagePerDayCandidateModel

eemeter.get_cdd_only_candidate_models(data, minimum_non_zero_cdd, minimum_total_cdd, beta_cdd_maximum_p_value, weights_col)[source]

Return a list of all possible candidate cdd-only models.

Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and 1 to n columns with names of the form cdd_<balance_point>. All columns with names of this form will be used to fit a candidate model. DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods.
  • minimum_non_zero_cdd (int) – Minimum allowable number of non-zero cooling degree day values.
  • minimum_total_cdd (float) – Minimum allowable total sum of cooling degree day values.
  • beta_cdd_maximum_p_value (float) – The maximum allowable p-value of the beta cdd parameter.
  • weights_col (str or None) – The name of the column (if any) in data to use as weights.
Returns:

candidate_models – A list of cdd-only candidate models, with any associated warnings.

Return type:

list of CalTRACKUsagePerDayCandidateModel

eemeter.get_hdd_only_candidate_models(data, minimum_non_zero_hdd, minimum_total_hdd, beta_hdd_maximum_p_value, weights_col)[source]
Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and 1 to n columns with names of the form hdd_<balance_point>. All columns with names of this form will be used to fit a candidate model. DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods.
  • minimum_non_zero_hdd (int) – Minimum allowable number of non-zero heating degree day values.
  • minimum_total_hdd (float) – Minimum allowable total sum of heating degree day values.
  • beta_hdd_maximum_p_value (float) – The maximum allowable p-value of the beta hdd parameter.
  • weights_col (str or None) – The name of the column (if any) in data to use as weights.
Returns:

candidate_models – A list of hdd-only candidate models, with any associated warnings.

Return type:

list of CalTRACKUsagePerDayCandidateModel

eemeter.get_cdd_hdd_candidate_models(data, minimum_non_zero_cdd, minimum_non_zero_hdd, minimum_total_cdd, minimum_total_hdd, beta_cdd_maximum_p_value, beta_hdd_maximum_p_value, weights_col)[source]

Return a list of candidate cdd_hdd models for a particular selection of cooling balance point and heating balance point

Parameters:
  • data (pandas.DataFrame) – A DataFrame containing at least the column meter_value and 1 to n columns each of the form hdd_<heating_balance_point> and cdd_<cooling_balance_point>. DataFrames of this form can be made using the eemeter.create_caltrack_daily_design_matrix or eemeter.create_caltrack_billing_design_matrix methods.
  • minimum_non_zero_cdd (int) – Minimum allowable number of non-zero cooling degree day values.
  • minimum_non_zero_hdd (int) – Minimum allowable number of non-zero heating degree day values.
  • minimum_total_cdd (float) – Minimum allowable total sum of cooling degree day values.
  • minimum_total_hdd (float) – Minimum allowable total sum of heating degree day values.
  • beta_cdd_maximum_p_value (float) – The maximum allowable p-value of the beta cdd parameter.
  • beta_hdd_maximum_p_value (float) – The maximum allowable p-value of the beta hdd parameter.
  • weights_col (str or None) – The name of the column (if any) in data to use as weights.
Returns:

candidate_models – A list of cdd_hdd candidate models, with any associated warnings.

Return type:

list of CalTRACKUsagePerDayCandidateModel

eemeter.select_best_candidate(candidate_models)[source]

Select and return the best candidate model based on r-squared and qualification.

Parameters:candidate_models (list of eemeter.CalTRACKUsagePerDayCandidateModel) – Candidate models to select from.
Returns:(best_candidate, warnings) – Return the candidate model with highest r-squared or None if none meet the requirements, and a list of warnings about this selection (or lack of selection).
Return type:tuple of eemeter.CalTRACKUsagePerDayCandidateModel or None and list of eemeter.EEMeterWarning

Savings

These methods are designed for computing metered and normal year savings.

eemeter.metered_savings(baseline_model, reporting_meter_data, temperature_data, with_disaggregated=False, confidence_level=0.9, predict_kwargs=None)[source]

Compute metered savings, i.e., savings in which the baseline model is used to calculate the modeled usage in the reporting period. This modeled usage is then compared to the actual usage from the reporting period. Also compute two measures of the uncertainty of the aggregate savings estimate, a fractional savings uncertainty (FSU) error band and an OLS error band. (To convert the FSU error band into FSU, divide by total estimated savings.)

Parameters:
  • baseline_model (eemeter.CalTRACKUsagePerDayModelResults) – Object to use for predicting pre-intervention usage.
  • reporting_meter_data (pandas.DataFrame) – The observed reporting period data (totals). Savings will be computed for the periods supplied in the reporting period data.
  • temperature_data (pandas.Series) – Hourly-frequency timeseries of temperature data during the reporting period.
  • with_disaggregated (bool, optional) – If True, calculate baseline counterfactual disaggregated usage estimates. Savings cannot be disaggregated for metered savings. For that, use eemeter.modeled_savings.
  • confidence_level (float, optional) –

    The two-tailed confidence level used to calculate the t-statistic used in calculation of the error bands.

    Ignored if not computing error bands.

  • predict_kwargs (dict, optional) – Extra kwargs to pass to the baseline_model.predict method.
Returns:

  • results (pandas.DataFrame) – DataFrame with metered savings, indexed with reporting_meter_data.index. Will include the following columns:

    • counterfactual_usage (baseline model projected into reporting period)
    • reporting_observed (given by reporting_meter_data)
    • metered_savings

    If with_disaggregated is set to True, the following columns will also be in the results DataFrame:

    • counterfactual_base_load
    • counterfactual_heating_load
    • counterfactual_cooling_load
  • error_bands (dict, optional) – If baseline_model is an instance of CalTRACKUsagePerDayModelResults, will also return a dictionary of FSU and OLS error bands for the aggregated energy savings over the post period.

eemeter.modeled_savings(baseline_model, reporting_model, result_index, temperature_data, with_disaggregated=False, confidence_level=0.9, predict_kwargs=None)[source]

Compute modeled savings, i.e., savings in which baseline and reporting usage values are based on models. This is appropriate for annualizing or weather normalizing models.

Parameters:
  • baseline_model (eemeter.CalTRACKUsagePerDayCandidateModel) – Model to use for predicting pre-intervention usage.
  • reporting_model (eemeter.CalTRACKUsagePerDayCandidateModel) – Model to use for predicting post-intervention usage.
  • result_index (pandas.DatetimeIndex) – The dates for which usage should be modeled.
  • temperature_data (pandas.Series) – Hourly-frequency timeseries of temperature data during the modeled period.
  • with_disaggregated (bool, optional) – If True, calculate modeled disaggregated usage estimates and savings.
  • confidence_level (float, optional) –

    The two-tailed confidence level used to calculate the t-statistic used in calculation of the error bands.

    Ignored if not computing error bands.

  • predict_kwargs (dict, optional) – Extra kwargs to pass to the baseline_model.predict and reporting_model.predict methods.
Returns:

  • results (pandas.DataFrame) – DataFrame with modeled savings, indexed with the result_index. Will include the following columns:

    • modeled_baseline_usage
    • modeled_reporting_usage
    • modeled_savings

    If with_disaggregated is set to True, the following columns will also be in the results DataFrame:

    • modeled_baseline_base_load
    • modeled_baseline_cooling_load
    • modeled_baseline_heating_load
    • modeled_reporting_base_load
    • modeled_reporting_cooling_load
    • modeled_reporting_heating_load
    • modeled_base_load_savings
    • modeled_cooling_load_savings
    • modeled_heating_load_savings
  • error_bands (dict, optional) – If baseline_model and reporting_model are instances of CalTRACKUsagePerDayModelResults, will also return a dictionary of FSU and error bands for the aggregated energy savings over the normal year period.

Exceptions

These exceptions are used in the package to indicate various common issues.

exception eemeter.EEMeterError[source]

Base class for EEmeter library errors.

exception eemeter.NoBaselineDataError[source]

Error indicating lack of baseline data.

exception eemeter.NoReportingDataError[source]

Error indicating lack of reporting data.

exception eemeter.MissingModelParameterError[source]

Error indicating missing model parameter.

exception eemeter.UnrecognizedModelTypeError[source]

Error indicating unrecognized model type.

Features

These methods are used to compute features that are used in creating CalTRACK models.

eemeter.compute_usage_per_day_feature(meter_data, series_name='usage_per_day')[source]

Compute average usage per day for billing/daily data.

Parameters:
  • meter_data (pandas.DataFrame) – Meter data for which to compute usage per day.
  • series_name (str) – Name of the output pandas series
Returns:

usage_per_day_feature – The usage per day feature.

Return type:

pandas.Series

eemeter.compute_occupancy_feature(hour_of_week, occupancy)[source]

Given an hour of week feature, determine the occupancy for that hour of week.

Parameters:
Returns:

occupancy_feature – Occupancy labels for the timeseries.

Return type:

pandas.Series

eemeter.compute_temperature_features(meter_data_index, temperature_data, heating_balance_points=None, cooling_balance_points=None, data_quality=False, temperature_mean=True, degree_day_method='daily', percent_hourly_coverage_per_day=0.5, percent_hourly_coverage_per_billing_period=0.9, use_mean_daily_values=True, tolerance=None, keep_partial_nan_rows=False)[source]

Compute temperature features from hourly temperature data using the pandas.DatetimeIndex meter data..

Creates a pandas.DataFrame with the same index as the meter data.

Note

For CalTRACK compliance (2.2.2.3), must set percent_hourly_coverage_per_day=0.5, cooling_balance_points=range(30,90,X), and heating_balance_points=range(30,90,X), where X is either 1, 2, or 3. For natural gas meter use data, must set fit_cdd=False.

Note

For CalTRACK compliance (2.2.3.2), for billing methods, must set percent_hourly_coverage_per_billing_period=0.9.

Note

For CalTRACK compliance (2.3.3), meter_data_index and temperature_data must both be timezone-aware and have matching timezones.

Note

For CalTRACK compliance (3.3.1.1), for billing methods, must set use_mean_daily_values=True.

Note

For CalTRACK compliance (3.3.1.2), for daily or billing methods, must set degree_day_method=daily.

Parameters:
  • meter_data_index (pandas.DataFrame) – A pandas.DatetimeIndex corresponding to the index over which to compute temperature features.
  • temperature_data (pandas.Series) – Series with pandas.DatetimeIndex with hourly ('H') frequency and a set of temperature values.
  • cooling_balance_points (list of int or float, optional) – List of cooling balance points for which to create cooling degree days.
  • heating_balance_points (list of int or float, optional) – List of heating balance points for which to create heating degree days.
  • data_quality (bool, optional) – If True, compute data quality columns for temperature, i.e., temperature_not_null and temperature_null, containing for each meter value
  • temperature_mean (bool, optional) – If True, compute temperature means for each meter period.
  • degree_day_method (str, 'daily' or 'hourly') – The method to use in calculating degree days.
  • percent_hourly_coverage_per_day (str, optional) – Percent hourly temperature coverage per day for heating and cooling degree days to not be dropped.
  • use_mean_daily_values (bool, optional) – If True, meter and degree day values should be mean daily values, not totals. If False, totals will be used instead.
  • tolerance (pandas.Timedelta, optional) – Do not merge more than this amount of temperature data beyond this limit.
  • keep_partial_nan_rows (bool, optional) – If True, keeps data in resultant pandas.DataFrame that has missing temperature or meter data. Otherwise, these rows are overwritten entirely with numpy.nan values.
Returns:

data – A dataset with the specified parameters.

Return type:

pandas.DataFrame

eemeter.compute_temperature_bin_features(temperatures, bin_endpoints)[source]

Compute temperature bin features.

Parameters:
  • temperatures (pandas.Series) – Hourly temperature data.
  • bin_endpoints (list of int or float) – List of bin endpoints to use when assigning features.
Returns:

temperature_bin_features – A datafame with the input index and one column per bin. The sum of each row (with all of the temperature bins) equals the input temperature. More details on this bin feature are available in the CalTRACK documentation.

Return type:

pandas.DataFrame

eemeter.compute_time_features(index, hour_of_week=True, day_of_week=True, hour_of_day=True)[source]

Compute hour of week, day of week, or hour of day features.

Parameters:
  • index (pandas.DatetimeIndex) – Datetime index with hourly frequency.
  • hour_of_week (bool) – Include the hour_of_week feature.
  • day_of_week (bool) – Include the day_of_week feature.
  • hour_of_day (bool) – Include the hour_of_day feature.
Returns:

time_features – A dataframe with the input datetime index and up to three columns

  • hour_of_week : Label for hour of week, 0-167, 0 is 12-1am Monday
  • day_of_week : Label for day of week, 0-6, 0 is Monday.
  • hour_of_day : Label for hour of day, 0-23, 0 is 12-1am.

Return type:

pandas.DataFrame

eemeter.estimate_hour_of_week_occupancy(data, segmentation=None, threshold=0.65)[source]

Estimate occupancy features for each segment.

Parameters:
  • data (pandas.DataFrame) – Input data for the weighted least squares (“meter_value ~ cdd_65 + hdd_50”) used to estimate occupancy. Must contain meter_value, hour_of_week, cdd_65, and hdd_50 columns with an hourly pandas.DatetimeIndex.
  • segmentation (pandas.DataFrame, default None) – A segmentation expressed as a dataframe which shares the timeseries index of the data and has named columns of weights, which are of the form returned by eemeter.segment_time_series.
  • threshold (float, default 0.65) – To be marked as unoccupied, the ratio of points with negative residuals in the weighted least squares in a particular hour of week must exceed this threshold. Said another way, in the default case, if more than 35% of values are greater than the basic degree day model for any particular hour of the week, that hour of week is marked as being occupied.
Returns:

occupancy_lookup – The occupancy lookup has a categorical index with values from 0 to 167 - one for each hour of the week, and boolean values indicating an occupied (1, True) or unoccupied (0, False) for each of the segments. Each segment has a column labeled by its segment name.

Return type:

pandas.DataFrame

eemeter.fit_temperature_bins(data, segmentation=None, occupancy_lookup=None, default_bins=[30, 45, 55, 65, 75, 90], min_temperature_count=20)[source]

Determine appropriate temperature bins for a particular set of temperature data given segmentation and occupancy.

Parameters:
  • data (pandas.Series) – Input temperature data with an hourly pandas.DatetimeIndex
  • segmentation (pandas.DataFrame, default None) – A dataframe containing segment weights with one column per segment. If left off, segmentation will not be considered.
  • occupancy_lookup (pandas.DataFrame, default None) – A dataframe of the form returned by eemeter.estimate_hour_of_week_occupancy containing occupancy for each segment. If None, occupancy will not be considered.
  • default_bins (list of float or int) – A list of candidate bin endpoints to begin the search with.
  • min_temperature_count (int) – The minimum number of temperatre values that must be included in any bin. If this threshold is not met, bins are dropped from the outside in following the algorithm described in the CalTRACK documentation.
Returns:

  • temperature_bins (pandas.DataFrame or, if occupancy_lookup is provided a)
  • two tuple of pandas.DataFrame – A dataframe with boolean values indicating whether or not a bin was kept, with a categorical index for each candidate bin endpoint and a column for each segment.

eemeter.get_missing_hours_of_week_warning(hours_of_week)[source]

Warn if any hours of week (0-167) are missing.

Parameters:hours_of_week (pandas.Series) – Hour of week feature as given by eemeter.compute_time_features.
Returns:warning – Warning with qualified name “eemeter.hour_of_week.missing”
Return type:eemeter.EEMeterWarning
eemeter.merge_features(features, keep_partial_nan_rows=False)[source]

Combine dataframes of features which share a datetime index.

Parameters:
  • features (list of pandas.DataFrame) – List of dataframes to be concatenated to share an index.
  • keep_partial_nan_rows (bool, default False) – If True, don’t overwrite partial rows with NaN, otherwise any row with a NaN value gets changed to all NaN values.
Returns:

merged_features – A single dataframe with the index of the input data and all of the columns in the input feature dataframes.

Return type:

pandas.DataFrame

Input and Output Utilities

These functions are used for reading and writing meter and temperature data.

eemeter.meter_data_from_csv(filepath_or_buffer, tz=None, start_col='start', value_col='value', gzipped=False, freq=None, **kwargs)[source]

Load meter data from a CSV file.

Default format:

start,value
2017-01-01T00:00:00+00:00,0.31
2017-01-02T00:00:00+00:00,0.4
2017-01-03T00:00:00+00:00,0.58
Parameters:
  • filepath_or_buffer (str or file-handle) – File path or object.
  • tz (str, optional) – E.g., 'UTC' or 'US/Pacific'
  • start_col (str, optional, default 'start') – Date period start column.
  • value_col (str, optional, default 'value') – Value column, can be in any unit.
  • gzipped (bool, optional) – Whether file is gzipped.
  • freq (str, optional) – If given, apply frequency to data using pandas.DataFrame.resample.
  • **kwargs – Extra keyword arguments to pass to pandas.read_csv, such as sep='|'.
eemeter.meter_data_from_json(data, orient='list')[source]

Load meter data from json.

Default format:

[
    ['2017-01-01T00:00:00+00:00', 3.5],
    ['2017-02-01T00:00:00+00:00', 0.4],
    ['2017-03-01T00:00:00+00:00', 0.46],
]

records format:

[
    {'start': '2017-01-01T00:00:00+00:00', 'value': 3.5},
    {'start': '2017-02-01T00:00:00+00:00', 'value': 0.4},
    {'start': '2017-03-01T00:00:00+00:00', 'value': 0.46},
]
Parameters:
  • data (list) – A list of meter data, with each row representing a single record.
  • orient
    Format of data parameter:
    • list (a list of lists, with the first element as start date)
    • records (a list of dicts)
Returns:

df – DataFrame with a single column ('value') and a pandas.DatetimeIndex. A second column ('estimated') may also be included if the input data contained an estimated boolean flag.

Return type:

pandas.DataFrame

eemeter.meter_data_to_csv(meter_data, path_or_buf)[source]

Write meter data to CSV. See also pandas.DataFrame.to_csv.

Parameters:
  • meter_data (pandas.DataFrame) – Meter data DataFrame with 'value' column and pandas.DatetimeIndex.
  • path_or_buf (str or file handle, default None) – File path or object, if None is provided the result is returned as a string.
eemeter.temperature_data_from_csv(filepath_or_buffer, tz=None, date_col='dt', temp_col='tempF', gzipped=False, freq=None, **kwargs)[source]

Load temperature data from a CSV file.

Default format:

dt,tempF
2017-01-01T00:00:00+00:00,21
2017-01-01T01:00:00+00:00,22.5
2017-01-01T02:00:00+00:00,23.5
Parameters:
  • filepath_or_buffer (str or file-handle) – File path or object.
  • tz (str, optional) – E.g., 'UTC' or 'US/Pacific'
  • date_col (str, optional, default 'dt') – Date period start column.
  • temp_col (str, optional, default 'tempF') – Temperature column.
  • gzipped (bool, optional) – Whether file is gzipped.
  • freq (str, optional) – If given, apply frequency to data using pandas.Series.resample.
  • **kwargs – Extra keyword arguments to pass to pandas.read_csv, such as sep='|'.
eemeter.temperature_data_from_json(data, orient='list')[source]

Load temperature data from json. (Must be given in degrees Fahrenheit).

Default format:

[
    ['2017-01-01T00:00:00+00:00', 3.5],
    ['2017-01-01T01:00:00+00:00', 5.4],
    ['2017-01-01T02:00:00+00:00', 7.4],
]
Parameters:data (list) – List elements are each a rows of data.
Returns:series – DataFrame with a single column ('tempF') and a pandas.DatetimeIndex.
Return type:pandas.Series
eemeter.temperature_data_to_csv(temperature_data, path_or_buf)[source]

Write temperature data to CSV. See also pandas.DataFrame.to_csv.

Parameters:
  • temperature_data (pandas.Series) – Temperature data series with pandas.DatetimeIndex.
  • path_or_buf (str or file handle, default None) – File path or object, if None is provided the result is returned as a string.

Metrics

This class is used for computing model metrics.

class eemeter.ModelMetrics(observed_input, predicted_input, num_parameters=1, autocorr_lags=1, confidence_level=0.9)[source]

Contains measures of model fit and summary statistics on the input series.

Parameters:
  • observed_input (pandas.Series) – Series with pandas.DatetimeIndex with a set of electricity or gas meter values.
  • predicted_input (pandas.Series) – Series with pandas.DatetimeIndex with a set of electricity or gas meter values.
  • num_parameters (int, optional) – The number of parameters (excluding the intercept) used in the regression from which the predictions were derived.
  • autocorr_lags (int, optional) – The number of lags to use when calculating the autocorrelation of the residuals.
  • confidence_level (int, optional) – Confidence level used in fractional savings uncertainty computations.
observed_length

The length of the observed_input series.

Type:int
predicted_length

The length of the predicted_input series.

Type:int
merged_length

The length of the dataframe resulting from the inner join of the observed_input series and the predicted_input series.

Type:int
observed_mean

The mean of the observed_input series.

Type:float
predicted_mean

The mean of the predicted_input series.

Type:float
observed_skew

The skew of the observed_input series.

Type:float
predicted_skew

The skew of the predicted_input series.

Type:float
observed_kurtosis

The excess kurtosis of the observed_input series.

Type:float
predicted_kurtosis

The excess kurtosis of the predicted_input series.

Type:float
observed_cvstd

The coefficient of standard deviation of the observed_input series.

Type:float
predicted_cvstd

The coefficient of standard deviation of the predicted_input series.

Type:float
r_squared

The r-squared of the model from which the predicted_input series was produced.

Type:float
r_squared_adj

The r-squared of the predicted_input series relative to the observed_input series, adjusted by the number of parameters in the model.

Type:float
cvrmse

The coefficient of variation (root-mean-squared error) of the predicted_input series relative to the observed_input series.

Type:float
cvrmse_adj

The coefficient of variation (root-mean-squared error) of the predicted_input series relative to the observed_input series, adjusted by the number of parameters in the model.

Type:float
mape

The mean absolute percent error of the predicted_input series relative to the observed_input series.

Type:float
mape_no_zeros

The mean absolute percent error of the predicted_input series relative to the observed_input series, with all time periods dropped where the observed_input series was not greater than zero.

Type:float
num_meter_zeros

The number of time periods for which the observed_input series was not greater than zero.

Type:int
nmae

The normalized mean absolute error of the predicted_input series relative to the observed_input series.

Type:float
nmbe

The normalized mean bias error of the predicted_input series relative to the observed_input series.

Type:float
autocorr_resid

The autocorrelation of the residuals (where the residuals equal the predicted_input series minus the observed_input series), measured using a number of lags equal to autocorr_lags.

Type:float
n_prime

The number of baseline inputs corrected for autocorrelation – used in fractional savings uncertainty computation.

Type:float
single_tailed_confidence_level

The adjusted confidence level for use in single-sided tests.

Type:float
degrees_of_freedom

Maxmimum number of independent variables which have the freedom to vary

Type::any:`float
t_stat

t-statistic, used for hypothesis testing

Type::any:`float
cvrmse_auto_corr_correction

Correctoin factor the apply to cvrmse to account for autocorrelation of inputs.

Type::any:`float
approx_factor_auto_corr_correction

Approximation factor used in ashrae 14 guideline for uncertainty computation.

Type::any:`float
fsu_base_term

Base term used in fractional savings uncertainty computation.

Type::any:`float
classmethod from_json(data)[source]

Loads a JSON-serializable representation into the model state.

The input of this function is a dict which can be the result of json.loads.

json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

Sample Data

These sample data are provided to make things easier for new users.

eemeter.samples()[source]

Load a list of sample data identifiers.

Returns:samples – List of sample identifiers for use with eemeter.load_sample.
Return type:list of str
eemeter.load_sample(sample)[source]

Load meter data, temperature data, and metadata for associated with a particular sample identifier. Note: samples are simulated, not real, data.

Parameters:sample (str) – Identifier of sample. Complete list can be obtained with eemeter.samples.
Returns:meter_data, temperature_data, metadata – Meter data, temperature data, and metadata for this sample identifier.
Return type:tuple of pandas.DataFrame, pandas.Series, and dict

Segmentation

These methods are used within CalTRACK hourly to support building multiple partial models and combining them into one full model.

eemeter.iterate_segmented_dataset(data, segmentation=None, feature_processor=None, feature_processor_kwargs=None, feature_processor_segment_name_mapping=None)[source]

A utility for iterating over segments which allows providing a function for processing outputs into features.

Parameters:
  • data (pandas.DataFrame, required) – Data to segment,
  • segmentation (pandas.DataFrame, default None) – A segmentation of the input dataframe expressed as a dataframe which shares the timeseries index of the data and has named columns of weights, which are iterated over to create the outputs (or inputs to the feature processor, which then creates the actual outputs).
  • feature_processor (function, default None) – A function that transforms raw inputs (temperatures) into features for each segment.
  • feature_processor_kwargs (dict, default None) – A dict of keyword arguments to be passed as **kwargs to the feature_processor function.
  • feature_processor_segment_name_mapping (dict, default None) – A mapping from the default segmentation segment names to alternate names. This is useful when prediction uses a different segment type than fitting.
eemeter.segment_time_series(index, segment_type='single', drop_zero_weight_segments=False)[source]

Split a time series index into segments by applying weights.

Parameters:
  • index (pandas.DatetimeIndex) – A time series index which gets split into segments.
  • segment_type (str) –

    The method to use when creating segments.

    • ”single”: creates one big segment with the name “all”.
    • ”one_month”: creates up to twelve segments, each of which contains a single month. Segment names are “jan”, “feb”, … “dec”.
    • ”three_month”: creates up to twelve overlapping segments, each of which contains three calendar months of data. Segment names are “dec-jan-feb”, “jan-feb-mar”, … “nov-dec-jan”
    • ”three_month_weighted”: creates up to twelve overlapping segments, each of contains three calendar months of data with first and third month in each segment having weights of one half. Segment names are “dec-jan-feb-weighted”, “jan-feb-mar-weighted”, … “nov-dec-jan-weighted”.
Returns:

segmentation – A segmentation of the input index expressed as a dataframe which shares the input index and has named columns of weights.

Return type:

pandas.DataFrame

class eemeter.CalTRACKSegmentModel(segment_name, model, formula, model_params, warnings=None)[source]

An object that captures the model fit for one segment.

segment_name

The name of the segment of data this model was fit to.

Type:str
model

The fitted model object.

Type:object
formula

The formula of the model regression.

Type:str
model_param

A dictionary of parameters

Type:dict
warnings

A list of eemeter warnings.

Type:list
classmethod from_json(data)[source]

Loads a JSON-serializable representation into the model state.

The input of this function is a dict which can be the result of json.loads.

json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

predict(data)[source]

A function which takes input data and predicts for this segment model.

class eemeter.SegmentedModel(segment_models, prediction_segment_type, prediction_segment_name_mapping=None, prediction_feature_processor=None, prediction_feature_processor_kwargs=None)[source]

Represent a model which has been broken into multiple model segments (for CalTRACK Hourly, these are month-by-month segments, each of which is associated with a different model.

Parameters:
  • segment_models (dict of eemeter.CalTRACKSegmentModel) – Dictionary of segment models, keyed by segment name.
  • prediction_segment_type (str) – Any segment_type that can be passed to eemeter.segment_time_series, currently “single”, “one_month”, “three_month”, or “three_month_weighted”.
  • prediction_segment_name_mapping (dict of str) – A dictionary mapping the segment names for the segment type used for predicting to the segment names for the segment type used for fitting, e.g., {“<predict_segment_name>”: “<fit_segment_name>”}.
  • prediction_feature_processor (function) – A function that transforms raw inputs (temperatures) into features for each segment.
  • prediction_feature_processor_kwargs (dict) – A dict of keyword arguments to be passed as **kwargs to the prediction_feature_processor function.
json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.

predict(prediction_index, temperature, **kwargs)[source]

Predict over a prediction index by combining results from all models.

Parameters:
  • prediction_index (pandas.DatetimeIndex) – The index over which to predict.
  • temperature (pandas.Series) – Hourly temperatures.
  • **kwargs – Extra argmuents will be ignored
class eemeter.HourlyModelPrediction(result)
result

Alias for field number 0

Transformation utilities

These functions are used to various common data transformations based on pandas inputs.

eemeter.as_freq(data_series, freq, atomic_freq='1 Min', series_type='cumulative', include_coverage=False)[source]

Resample data to a different frequency.

This method can be used to upsample or downsample meter data. The assumption it makes to do so is that meter data is constant and averaged over the given periods. For instance, to convert billing-period data to daily data, this method first upsamples to the atomic frequency (1 minute freqency, by default), “spreading” usage evenly across all minutes in each period. Then it downsamples to hourly frequency and returns that result. With instantaneous series, the data is copied to all contiguous time intervals and the mean over freq is returned.

Caveats:

  • This method gives a fair amount of flexibility in resampling as long as you are OK with the assumption that usage is constant over the period (this assumption is generally broken in observed data at large enough frequencies, so this caveat should not be taken lightly).
Parameters:
  • data_series (pandas.Series) – Data to resample. Should have a pandas.DatetimeIndex.
  • freq (str) – The frequency to resample to. This should be given in a form recognized by the pandas.Series.resample method.
  • atomic_freq (str, optional) – The “atomic” frequency of the intermediate data form. This can be adjusted to a higher atomic frequency to increase speed or memory performance.
  • series_type (str, {‘cumulative’, ‘instantaneous’},) – default ‘cumulative’ Type of data sampling. ‘cumulative’ data can be spread over smaller time intervals and is aggregated using addition (e.g. meter data). ‘instantaneous’ data is copied (not spread) over smaller time intervals and is aggregated by averaging (e.g. weather data).
  • include_coverage (bool,) – default False Option of whether to return a series with just the resampled values or a dataframe with a column that includes percent coverage of source data used for each sample.
Returns:

resampled_data – Data resampled to the given frequency (optionally as a dataframe with a coverage column if include_coverage is used.

Return type:

pandas.Series or pandas.DataFrame

eemeter.day_counts(index)[source]

Days between DatetimeIndex values as a pandas.Series.

Parameters:index (pandas.DatetimeIndex) – The index for which to get day counts.
Returns:day_counts – A pandas.Series with counts of days between periods. Counts are given on start dates of periods.
Return type:pandas.Series
eemeter.get_baseline_data(data, start=None, end=None, max_days=365, allow_billing_period_overshoot=False, n_days_billing_period_overshoot=None, ignore_billing_period_gap_for_day_count=False)[source]

Filter down to baseline period data.

Note

For compliance with CalTRACK, set max_days=365 (section 2.2.1.1).

Parameters:
  • data (pandas.DataFrame or pandas.Series) – The data to filter to baseline data. This data will be filtered down to an acceptable baseline period according to the dates passed as start and end, or the maximum period specified with max_days.
  • start (datetime.datetime) – A timezone-aware datetime that represents the earliest allowable start date for the baseline data. The stricter of this or max_days is used to determine the earliest allowable baseline period date.
  • end (datetime.datetime) – A timezone-aware datetime that represents the latest allowable end date for the baseline data, i.e., the latest date for which data is available before the intervention begins.
  • max_days (int, default 365) – The maximum length of the period. Ignored if end is not set. The stricter of this or start is used to determine the earliest allowable baseline period date.
  • allow_billing_period_overshoot (bool, default False) – If True, count max_days from the end of the last billing data period that ends before the end date, rather than from the exact end date. Otherwise use the exact end date as the cutoff.
  • n_days_billing_period_overshoot (int, default None) – If allow_billing_period_overshoot is set to True, this determines the number of days of overshoot that will be tolerated. A value of None implies that any number of days is allowed.
  • ignore_billing_period_gap_for_day_count (bool, default False) –

    If True, instead of going back max_days from either the end date or end of the last billing period before that date (depending on the value of the allow_billing_period_overshoot setting) and excluding the last period that began before that date, first check to see if excluding or including that period gets closer to a total of max_days of data.

    For example, with max_days=365, if an exact 365 period would targeted Feb 15, but the billing period went from Jan 20 to Feb 20, exclude that period for a total of ~360 days of data, because that’s closer to 365 than ~390 days, which would be the total if that period was included. If, on the other hand, if that period started Feb 10 and went to Mar 10, include the period, because ~370 days of data is closer to than ~340.

Returns:

baseline_data, warnings – Data for only the specified baseline period and any associated warnings.

Return type:

tuple of (pandas.DataFrame or pandas.Series, list of eemeter.EEMeterWarning)

eemeter.get_reporting_data(data, start=None, end=None, max_days=365, allow_billing_period_overshoot=False, ignore_billing_period_gap_for_day_count=False)[source]

Filter down to reporting period data.

Parameters:
  • data (pandas.DataFrame or pandas.Series) – The data to filter to reporting data. This data will be filtered down to an acceptable reporting period according to the dates passed as start and end, or the maximum period specified with max_days.
  • start (datetime.datetime) – A timezone-aware datetime that represents the earliest allowable start date for the reporting data, i.e., the earliest date for which data is available after the intervention begins.
  • end (datetime.datetime) – A timezone-aware datetime that represents the latest allowable end date for the reporting data. The stricter of this or max_days is used to determine the latest allowable reporting period date.
  • max_days (int, default 365) – The maximum length of the period. Ignored if start is not set. The stricter of this or end is used to determine the latest allowable reporting period date.
  • allow_billing_period_overshoot (bool, default False) – If True, count max_days from the start of the first billing data period that starts after the start date, rather than from the exact start date. Otherwise use the exact start date as the cutoff.
  • ignore_billing_period_gap_for_day_count (bool, default False) –

    If True, instead of going forward max_days from either the start date or the start of the first billing period after that date (depending on the value of the allow_billing_period_overshoot setting) and excluding the first period that ended after that date, first check to see if excluding or including that period gets closer to a total of max_days of data.

    For example, with max_days=365, if an exact 365 period would targeted Feb 15, but the billing period went from Jan 20 to Feb 20, include that period for a total of ~370 days of data, because that’s closer to 365 than ~340 days, which would be the total if that period was excluded. If, on the other hand, if that period started Feb 10 and went to Mar 10, exclude the period, because ~360 days of data is closer to than ~390.

Returns:

class eemeter.Term(index, label, target_start_date, target_end_date, target_term_length_days, actual_start_date, actual_end_date, actual_term_length_days, complete)[source]

The term object represents a subset of an index.

index

The index of the term. Includes a period at the end meant to be NaN-value.

Type:pandas.DatetimeIndex
label

The label for the term.

Type:str
target_start_date

The start date inferred for this term from the start date and target term lenths.

Type:pandas.Timestamp or datetime.datetime
target_end_date

The end date inferred for this term from the start date and target term lenths.

Type:pandas.Timestamp or datetime.datetime
target_term_length_days

The number of days targeted for this term.

Type:int
actual_start_date

The first date in the index.

Type:pandas.Timestamp
actual_end_date

The last date in the index.

Type:pandas.Timestamp
actual_term_length_days

The number of days between the actual start date and actual end date.

Type:int
complete

True if this term is conclusively complete, such that additional data added to the series would not add more data to this term.

Type:bool
eemeter.get_terms(index, term_lengths, term_labels=None, start=None, method='strict')[source]

Breaks a pandas.DatetimeIndex into consecutive terms of specified lengths.

Parameters:
  • index (pandas.DatetimeIndex) – The index to split into terms, generally meter_data.index or temperature_data.index.
  • term_lengths (list of int) – The lengths (in days) of the terms into which to split the data.
  • term_labels (list of str, default None) – Labels to use for each term. List must be the same length as the term_lengths list.
  • start (datetime.datetime, default None) – A timezone-aware datetime that represents the earliest allowable start date for the terms. If None, use the first element of the index.
  • method (one of ['strict', 'nearest'], default 'strict') –

    The method to use to get terms.

    • ”strict”: Ensures that the term end will come on or before the length of
Returns:

terms – A dataframe of term labels with the same pandas.DatetimeIndex given as index. This can be used to filter the original data into terms of approximately the desired length.

Return type:

list of eemeter.Term

eemeter.remove_duplicates(df_or_series)[source]

Remove duplicate rows or values by keeping the first of each duplicate.

Parameters:df_or_series (pandas.DataFrame or pandas.Series) – Pandas object from which to drop duplicate index values.
Returns:deduplicated – The deduplicated pandas object.
Return type:pandas.DataFrame or pandas.Series
eemeter.overwrite_partial_rows_with_nan(df)[source]

Version

This method can used to verify the eemeter version.

eemeter.get_version()[source]

Visualization

These functions are used to visualization of models and meter and temperature data inputs.

eemeter.plot_time_series(meter_data, temperature_data, **kwargs)[source]

Plot meter and temperature data in dual-axes time series.

Parameters:
Returns:

axes – Tuple of (ax_meter_data, ax_temperature_data).

Return type:

tuple of matplotlib.axes.Axes

eemeter.plot_energy_signature(meter_data, temperature_data, temp_col=None, ax=None, title=None, figsize=None, **kwargs)[source]

Plot meter and temperature data in energy signature.

Parameters:
Returns:

ax – Matplotlib axes.

Return type:

matplotlib.axes.Axes

Warnings

class eemeter.EEMeterWarning(qualified_name, description, data)[source]

An object representing a warning and data associated with it.

qualified_name

Qualified name, e.g., ‘eemeter.method_abc.missing_data’.

Type:str
description

Prose describing the nature of the warning.

Type:str
data

Data that reproducibly shows why the warning was issued. Data should be JSON serializable.

Type:dict
json()[source]

Return a JSON-serializable representation of this result.

The output of this function can be converted to a serialized string with json.dumps.