Data formats for gamma-ray astronomy 0.3

https://licensebuttons.net/l/by/4.0/88x31.png

The Data formats for gamma-ray astronomy is a community-driven initiative for the definition of a common and open high-level data format for gamma-ray instruments.

References

The following paper describes the context of this initiative and its evolution:

Content

About

In gamma-ray astronomy, a variety of data formats and proprietary software have been traditionally used, often developed for one specific mission or experiment. Especially for ground-based imaging atmospheric Cherenkov telescopes (IACTs), data and software are mostly private to the collaborations operating the telescopes. The next-generation IACT instrument, the Cherenkov Telescope Array (CTA), will be the first ground-based gamma-ray telescope array operated as an open observatory with public observer access. This implies fundamentally different requirements for the data formats and software tools. Open community access is a novelty in this domain, putting a challenge on the implementation of services that make very high energy (VHE) gamma-ray astronomy as accessible as any other waveband.

To work towards open and interoperable data formats for gamma-ray astronomy, we have started this open data format specification. This was started at the PyGamma15 workshop in November 2015, followed by a meeting in April 2016 in Meudon and another meeting in March 2017 in Heidelberg. Version 0.1 of the spec was release in April 2016, version 0.2 in August 2018, version 0.3 in November 2022. You can find more information about this effort in Deil et al. (2017).

The scope of this effort is to cover all high-level data from gamma-ray telescopes, starting at the level of event lists and IRFs, also covering reduced IRFs, maps, spectra, light curves, up to source models and catalogs. Both pointing (IACT) and slewing (Fermi-LAT, HAWC) telescope data is in scope. To a large degree, the formats should also be useful for other astroparticle data, e.g. from cosmic rays or neutrinos.

All of the data formats described here at the moment can be, and in practice mostly are, stored in FITS. Some experimentation with HDF5 and ECSV is ongoing. The data format specifications don’t explicity mention the serialisation format, but rather the key name and data type for each metadata or data field.

The formats described here are partially in use by current instruments (Fermi-LAT, HESS, VERITAS, MAGIC, FACT) as well as in the next-generation Cherenkov Telescope Array CTA. Prototyping and support for many of the formats here is happening in the CTA science tool prototypes Gammapy and ctools. The data formats are also used e.g. in gamma-cat, a collection and source catalog for very-high-energy gamma-ray astronomy.

However, we would like to stress that this is an inofficial effort, the formats described here are work in progress and have not been officially adopted by any of the mentioned instruments. The expectation is that CTA partially adopt these formats, and partially develop new ones in the time frame 2018-2020, and then the other IACTs will use the official format chosen by CTA.

Development of this data format specification happens at https://github.com/open-gamma-ray-astro/gamma-astro-data-formats, information how to contribute is given in CONTRIBUTING.md there. Discussion on major and controversial points should happen on the mailing list (https://lists.nasa.gov/mailman/listinfo/open-gamma-ray-astro). So far, no official decision process exists, so some discussions are stuck because of major differences in opinion. Probably, moving forward, decisions will be made not in this open forum, but rather by the major instruments via their choices, especially CTA.

General

This section contains general information and basic definitions.

It’s purpose is two-fold:

  1. Users and developers can learn or look up the nitty-gritty details how coordinates, times, … are defined and basic information about file and storage formats (e.g. how axis-information for multi-dimensional arrays can be stored in FITS files).
  2. Data format specifications can refer to the definitions in this section, e.g. we don’t have to repeat that the azimuth angle is measured east of north in each format specification where azimuth is used.

Time

Introduction

This page describes how times should be stored. This is a solved problem, we follow the FITS standard. However, the FITS standard is very complex (see also FITS time paper), and allows for different ways to store times in FITS, some of which are hard to understand and implement.

To keep things simple, we here agree on a way to store times that is fully compliant, but a subset of the ways allowed by the FITS standard (see Formats below).

This has the advantage of simplicity and uniformity for writers and readers (see Tools below).

One major point allowing for this simplicity is that we only need single 64-bit float precision for time in high-level (DL3 and up) gamma-ray astronomy, as explained in the Precision section below.

At the time of writing this general page on time, the following definitions reference it:

Other useful resources concerning time:

Formats

In tables, times should be given as 64-bit float values. For all the tables and columns mentioned in the introduction, the times are given as seconds since a reference time point.

The reference time point is specified by the following FITS header keywords:

  • MJDREFI type: int, unit: days
    • Integer part of instrument specific MJD time reference
  • MJDREFF type: float, unit: days
    • Float part of instrument specific MJD time reference
  • TIMEUNIT type: string
    • Time unit (e.g. ‘s’)
  • TIMESYS type: string
    • Time system, also referred as time scale (e.g. ‘UT1’, ‘UTC’, ‘TT’, ‘TAI’, see Table 30 of the FITS standard)
  • TIMEREF type: string
    • Time reference frame, used for example for barycentric corrections (options: ‘TOPOCENTER’, ‘GEOCENTER’, ‘BARYCENTER’, ‘HELIOCENTER’, and more see Table 31 of the FITS standard)

See the FITS standard and the FITS time paper for further information.

For light curves (not specified yet), the use of TIMESYS='d' in days is also common.

Some existing files don’t give TIMEUNIT and / or TIMEREF. In those cases, science tools should assume defaults of TIMEUNIT='s' and TIMEREF='LOCAL'. No defaults exist for MJDREFI, MJDREFF and TIMESYS, if those are missing science tools should raise an error and exit. New files should always be written with all five header keys.

In addition to that main way of specifying times as a floating point number wrt. a reference time point, the following header keys with date and time values as strings can be added. This is for convenience and humans reading the information. Usually science tools will not access this redundant and optional information. The time system used should be the one given by TIMESYS. All values for the keywords mentioned here shall be ISO8601 date and time strings, yyyy-mm-ddTHH:MM:SS.fff. The precision is not fixed, so seconds or fractional seconds could be left out.

  • DATE-OBS type: string
    • Observation date and time, typically the start of observations if not defined otherwise in the comment
  • DATE-BEG type: string
    • Observation start date and time
  • DATE-AVG type: string
    • Average observation date and time
  • DATE-END type: string
    • Observation end date and time
Tools

The SOFA Time Scale and Calendar Tools document provides a detailed description of times in the high-precision IAU SOFA library, which is the gold standard for times in astronomy. The SOFA time routines are available via the Astropy time Python package, which makes it easy to convert between different time scales (utc, tt and mjd in this example).

>>> from astropy.time import Time
>>> time = Time('2011-01-01 00:00:00', scale='utc', format='iso')
>>> time
<Time object: scale='utc' format='iso' value=2011-01-01 00:00:00.000>
>>> time.tt
<Time object: scale='tt' format='iso' value=2011-01-01 00:01:06.184>
>>> time.mjd
55562.0

as well as different time formats (iso, isot and fits in this example)

>>> time.iso
'2011-01-01 00:00:00.000'
>>> time.isot
'2011-01-01T00:00:00.000'
>>> time.fits
'2011-01-01T00:00:00.000(UTC)'

If you don’t want to install SOFA or Astropy (or to double-check), you can use the xTime time conversion utility provided by HEASARC as a web tool.

  • Gammapy uses Astropy time, with custom utility functions for FITS I/O to write in the formats recommended here.
  • Gammalib has custom code to hande times that is described in Times in Gammalib.
Precision

Depending on the use case and required precision, times are stored as strings or as one or several integer or floating point numbers. Tools usually use one 64-bit or two 64-bit floating point numbers for time calculations.

For high-level gamma-ray astronomy, the situation can be summarised like this (see sub-section computation below for details):

  • Do use single 64-bit floats for times. The resulting precision will be about 0.1 micro-seconds or better, which is sufficient for any high-level analysis (including milli-second pulsars).
  • Do not use 32-bit floats for times. If you do, times will be incorrect at the 1 to 100 second level.
  • More than 64-bit floats or a combination of two floating point numbers is not necessary

For data acquisition and low-level analysis (event triggering, traces, …), IACTs require nanosecond precision or better. There, the simple advice to use 64-bit floats representing seconds wrt. a single reference time doesn’t work! One either needs to have several reference times (e.g. per-observation) or two integer or float values. This is not covered by this spec.

The time precision obtained with a single 32-bit or 64-bit float can be computed with this function:

def time_precision(time_range, float_precision):
    """Compute time precision (seconds) in float computations.

    For a given `time_range` and `float_precision`, the `time_precision`
    is computed as the smallest time difference corresponding to the
    float precision.

    time_range -- (IN) Time range of application (years)
    float_precision -- (IN) {32, 64} Floating point precision
    time_precision -- (OUT) Time precision (seconds)
    """
    import numpy as np
    YEAR_TO_SEC = 315576000

    dtype = {32: np.float32, 64: np.float64}[float_precision]
    t1 = dtype(YEAR_TO_SEC * time_range)
    t2 = np.nextafter(t1, np.finfo(dtype).max)
    print('Time range: {} years, float precision: {} bit => time precision: {:.3g} seconds.'
          ''.format(time_range, float_precision, t2-t1))
>>> time_precision(10, 32)
Time range: 10 years, float precision: 32 bit => time precision: 256 seconds.
>>> time_precision(10, 64)
Time range: 10 years, float precision: 64 bit => time precision: 4.77e-07 seconds.

Coordinates

This section describes the sky coordinates in use by science tools. It is referenced from the description of data formats to explain the exact meaning of the coordinates stored.

We don’t have a separate section for world coordinate systems (WCS), pixel coordinates, projections, that is covered here as well (see FITS WCS and WCSLIB for references).

We only discuss 2-dimensional sky and image coordinates here, other coordinates like e.g. time or an energy axis aren’t covered here.

Some conventions are adopted from astropy.coordinates, which is a Python wrapper of the IAU SOFA C time and coordinate library, which is the authoritative implementation of the IAU Standards of Fundamental Astronomy. In some cases code examples are given using astropy.coordinates to obtain a reference value that can be used to check a given software package (in case it’s not based on astropy.coordinates).

RA / DEC

The most common way to give sky coordinates is as right ascension (RA) and declination (DEC) in the equatorial coordinate system.

Actually there are several equatorial coordinate systems in use, the most common ones being FK4, FK5 and ICRS. If you’re interested to learn more about these and other astronomical coordinate systems, look into the Explanatory Supplement to the Astronomical Almanac.

But in practice it’s pretty simple: when someone gives or talks about RA / DEC coordinates, they mean either ICRS or FK5 J2000 coordinates. The difference between those two is at the sub-arcsecond level for the whole sky, i.e. irrelevant for gamma-ray astronomy.

We recommend you by default assume RA / DEC is in the ICRS frame, which is the default in astropy.coordinates.SkyCoord and also the current standard celestial reference system adopted by the IAU (see Wikipedia - ICRS).

Galactic

The Galactic coordinate system is often used by Galactic astronomers.

Unfortunately there are slightly different variants in use (usually with differences at the arcsecond level), and there are no standard names for these slightly different Galactic coordinate frames. See here for an open discussion which Galactic coordinates to support and what to call them in Astropy.

We recommend you use ICRS RA / DEC for precision coordinate computations. If you do use Galactic coordinates, we recommend you compute them like Astropy does (which I think is the most frame in use in the literature and in existing astronomy software).

Both ICRS and Galactic coordinates don’t need the specification of an epoch or equinox.

To check your software, you can use the (l, b) = (0, 0) position:

>>> from astropy.coordinates import SkyCoord
>>> SkyCoord(0, 0, unit='deg', frame='galactic')
<SkyCoord (Galactic): (l, b) in deg (0.0, 0.0)>
>>> SkyCoord(0, 0, unit='deg', frame='galactic').icrs
<SkyCoord (ICRS): (ra, dec) in deg (266.40498829, -28.93617776)>
Alt / Az

The horizontal coordinate system is the one connected to an observer at a given location on earth and point in time.

  • Azimuth is oriented east of north (i.e. north is at 0 deg, east at 90 deg, south at 180 deg and west at 270 deg). This is the convention used by astropy.coordinates.AltAz and quoted as the most common convention in astronomy on Wikipedia (see horizontal coordinate system).
  • The zenith angle is defined as the angular separation from the zenith, which is the direction defined by the line connecting the Earth’s center and the observer. Altitude and elevation are the same thing, and are defined as 90 degree minus the zenith angle. The reason to define altitude like this instead of the angle above the horizon is that usually Earth models aren’t perfect spheres, but ellipsoids, so the zenith angle as defined here isn’t perfectly perpendicular with the horizon plane.
  • Unless explicitly specified, Alt / Az should be assumed to not include any refraction corrections, i.e. be valid assuming no refraction. Usually this can be achived in coordinate codes by setting the atmospheric pressure to zero, i.e. turning the atmosphere off.

Here’s some Astropy coordinates code that shows how to convert back and forth between ICRS and AltAz coordinates (the default pressure is set to zero in Astropy, i.e. this is without refraction corrections):

import astropy.units as u
from astropy.time import Time
from astropy.coordinates import Angle, SkyCoord, EarthLocation, AltAz

# Take any ICRS sky coordinate
icrs = SkyCoord.from_name('crab')
print('RA = {pos.ra.deg:10.5f}, DEC = {pos.dec.deg:10.5f}'.format(pos=icrs))
# RA =   83.63308, DEC =   22.01450

# Convert to AltAz for some random observation time and location
# This assumes pressure is zero, i.e. no refraction
time = Time('2010-04-26', scale='tt')
location = EarthLocation(lon=42 * u.deg, lat=42 * u.deg, height=42 * u.meter)
altaz_frame = AltAz(obstime=time, location=location)
altaz = icrs.transform_to(altaz_frame)
print('AZ = {pos.az.deg:10.5f}, ALT = {pos.alt.deg:10.5f}'.format(pos=altaz))
# AZ =  351.88232, ALT =  -25.56281

# Convert back to ICRS to make sure round-tripping is OK
icrs2 = altaz.transform_to('icrs')
print('RA = {pos.ra.deg:10.5f}, DEC = {pos.dec.deg:10.5f}'.format(pos=icrs2))
# RA =   83.63308, DEC =   22.01450
Field of view

FOV coordinates are currently used in two places in this spec:

  1. Some background models are in the FOV coordinate system and FOV coordinates can also be used for other IRFs.
  2. FOV coordinates appear is as optional columns in the events table. While it is possible to compute FOV coordinates from the RA, DEC, TIME columns and the observatory Earth location, some IACTs choose to add FOV coordinate columns to their event lists for convenience.

In Gamma-ray astronomy, sometimes field of view (FOV) coordinates are used. The basic idea is to have a coordinate system that is centered on the array pointing position. We define FOV coordinates here to be spherical coordinates, there is no projection or WCS, only a spherical rotation.

Two ways to give the spherical coordinate are defined:

  1. (LON, LAT) with the pointing position on the equator at (LON, LAT) = (0, 0)
    • LON: Longitude (range -180 deg to + 180 deg)
    • LAT: Latitude (range -90 deg to + 90 deg)
  2. (THETA, PHI) with the pointing position at the pole THETA=0
    • THETA: Offset angle (range 0 deg to +180 deg)
    • PHI: Position angle (range 0 deg to 360 deg)

Two orientations of the FOV coordinate system are defined:

  1. Aligned with the ALTAZ system
  2. Aligned with the RADEC system

This yields the following possible coordinates:

Field Description
FOV_ALTAZ_LON Longitude in ALTAZ FOV system
FOV_ALTAZ_LAT Latitude in ALTAZ FOV system
FOV_ALTAZ_THETA Offset angle in ALTAZ FOV system
FOV_ALTAZ_PHI Position angle in ALTAZ FOV system
FOV_RADEC_LON Longitude in RADEC FOV system
FOV_RADEC_LAT Latitude in RADEC FOV system
FOV_RADEC_THETA Offset angle in RADEC FOV system
FOV_RADEC_PHI Position angle in RADEC FOV system
  • The FOV offset angle (separation to pointing position) THETA doesn’t depend on the orientation. So in this spec, often simply THETA is used, and that is equal to FOV_ALTAZ_THETA and FOV_RADEC_THETA.
  • The other FOV coordinates depend on the alignment and orientation of a second coordinate systems (OTHER, either ALTAZ or RADEC).
    • FOV PHI is counterclockwise from OTHER north, i.e. PHI=0 deg pointing to OTHER LAT, and PHI=270 deg pointing to OTHER LON
    • FOV LON should increase with decreasing OTHER LON
    • FOV LAT should increase with increasing OTHER LAT

In the events table, the column names DETX and DETY are sometimes used. This originates from the OGIP event list standard, which uses these names for “detector coordinates”. Given that IACTs don’t have a detector chip (or at least the FOV coordinates used in high-level analysis are different from the IACT camera coordinate detectors), the definition is not unambiguous, both (DETX, DETY) = (FOV_ALTAZ_LON, FOV_ALTAZ_LAT) and (DETX, DETY) = (FOV_RADEC_LON, FOV_RADEC_LAT) have been used.

To resolve this ambiguity, we propose a header key FOVALIGN={ALTAZ,RADEC}, specifying which definition of field-of-view coordinates is used. If the key is not present, FOVALIGN=ALTAZ should be assumed as default.

Given the situation that there is no concensus yet, one suggestion is to avoid putting FOV coordinates in EVENTS, or if they are added, to clearly state how they are defined.

Earth location

When working with Alt-Az coordinates or very high-precision times, an observatory Earth location is needed. However, note that high-level analysis for most use cases does not need this information.

The FITS standard mentions OBSGEO-X, OBSGEO-Y, OBSGEO-Z header keys, and we might want to consider using those in the future.

For now, as of 2018, however, the existing IACT FITS data uses the following header keys, so their use is encouraged:

  • GEOLON type: float, unit: deg
    • Geographic longitude of array centre
  • GEOLAT type: float, unit: deg
    • Geographic latitude of array centre
  • ALTITUDE type: float, unit: m
    • Altitude of array center above sea level

While it is possible in principle to change this for each FITS file, in practice the observatory or telescope array centre position is something that is chosen once and then used consistently in the event reconstruction and analysis. As an example, H.E.S.S. uses the following location and FITS header keys:

GEOLAT  = -23.2717777777778 / latitude of observatory (deg)
GEOLON  =  16.5002222222222 / longitude of observatory (deg)
ALTITUDE=             1835. / altitude of observatory (m)

FITS Multidimensional datasets

As described e.g. here or here or in the FITS standard, there are several ways to serialise multi-dimensional arrays and corresponding axis information in FITS files.

Here we describe the schemes in use in gamma-ray astronomy and give examples.

IMAGE HDU
  • Data array is stored in an IMAGE HDU.
  • Axis information is either stored in the IMAGE HDU header or in extra BINTABLE HDUs, sometimes a mix.
  • Advantage: IMAGE HDUs can be opened up in image viewers like ds9.
  • Disadvantage: axis information is not self contained, an extra HDU is needed.
Example

E.g. the Fermi-LAT counts cubes or diffuse model spectral cubes are stored in an IMAGE HDU, with the information about the two celestial axes in WCS header keywords, and the information about the energy axis in ENERGIES (for spectral cube) or EBOUNDS (for counts cube) BINTABLE HDU extensions.

$ ftlist gll_iem_v02.fit H

        Name               Type       Dimensions
        ----               ----       ----------
HDU 1   Primary Array      Image      Real4(720x360x30)
HDU 2   ENERGIES           BinTable     1 cols x 30 rows

Let’s have a look at the header of the primary IMAGE HDU.

As you can see, there’s three axes.

The first two are Galactic longitude and latitude and the pixel to sky coordinate mapping is specified by header keywords according to the FITS WCS standard.

I think the energy axis isn’t a valid FITS WCS axis specification. ds9 uses the C????3 keys to infer a WCS mapping of pixels to energies, but it is incorrect. Software that’s supposed to work with this axis needs to know to look at the ENERGIES table instead.

$ ftlist gll_iem_v02.fit K
SIMPLE  =                    T / Written by IDL:  Tue Jul  7 15:25:03 2009
BITPIX  =                  -32 /
NAXIS   =                    3 / number of data axes
NAXIS1  =                  720 / length of data axis 1
NAXIS2  =                  360 / length of data axis 2
NAXIS3  =                   30 / length of data axis 3
EXTEND  =                    T / FITS dataset may contain extensions
COMMENT   FITS (Flexible Image Transport System) format is defined in 'Astronomy
COMMENT   and Astrophysics', volume 376, page 359; bibcode: 2001A&A...376..359H
FLUX    =        8.29632317174 /
CRVAL1  =                   0. / Value of longitude in pixel CRPIX1
CDELT1  =                  0.5 / Step size in longitude
CRPIX1  =                360.5 / Pixel that has value CRVAL1
CTYPE1  = 'GLON-CAR'           / The type of parameter 1 (Galactic longitude in
CUNIT1  = 'deg     '           / The unit of parameter 1
CRVAL2  =                   0. / Value of latitude in pixel CRPIX2
CDELT2  =                  0.5 / Step size in latitude
CRPIX2  =                180.5 / Pixel that has value CRVAL2
CTYPE2  = 'GLAT-CAR'           / The type of parameter 2 (Galactic latitude in C
CUNIT2  = 'deg     '           / The unit of parameter 2
CRVAL3  =                  50. / Energy of pixel CRPIX3
CDELT3  =    0.113828620540137 / log10 of step size in energy (if it is logarith
CRPIX3  =                   1. / Pixel that has value CRVAL3
CTYPE3  = 'photon energy'      / Axis 3 is the spectra
CUNIT3  = 'MeV     '           / The unit of axis 3
CHECKSUM= '3fdO3caL3caL3caL'   / HDU checksum updated 2009-07-07T22:31:18
DATASUM = '2184619035'         / data unit checksum updated 2009-07-07T22:31:18
HISTORY From Ring/Hybrid fit with GALPROP 54_87Xexph7S extrapolation
HISTORY Integrated flux (m^-2 s^-1) over all sky and energies:    8.30
HISTORY Written by rings_gll.pro
DATE    = '2009-07-07'         /
FILENAME= '$TEMPDIR/diffuse/gll_iem_v02.fit' /File name with version number
TELESCOP= 'GLAST   '           /
INSTRUME= 'LAT     '           /
ORIGIN  = 'LISOC   '           /LAT team product delivered from the LISOC
OBSERVER= 'MICHELSON'          /Instrument PI
END
BINTABLE HDU
  • Data array and axis information is stored in a BINTABLE HDU with one row.
  • This is called the “multidimensional array” convention in appendix B of 1995A%26AS..113..159C.
  • The OGIP Calibration Memo CAL/GEN/92-003 has a section use of multi-dimensional datasets that describes this format in greater detail.
  • Advantage: everything is contained in one HDU. (as many axes and data arrays as you like)
  • Disadvantage: format is a bit unintuitive / header is quite complex / can’t be opened directly in ds9.
Example

Let’s look at an example file in this format, the aeff_P6_v1_diff_back.fits which represents the Fermi-LAT effective area (an old version) as a function of energy and offset.

It follows the OGIP effective area format.

The data array and axis information are stored in one BINTABLE HDU called “EFFECTIVE AREA”, with 5 columns and one row:

$ ftlist aeff_P6_v1_diff_back.fits H

        Name               Type       Dimensions
        ----               ----       ----------
HDU 1   Primary Array      Null Array
HDU 2   EFFECTIVE AREA     BinTable     5 cols x 1 rows

There five columns contain arrays of different length that represent:

  • First axis is energy (ENERG_LO and ENERG_HI columns) with 60 bins.
  • Second axis is cosine of theta (CTHETA_LO and CTHETA_HI columns) with 32 bins.
  • First and only data array is effective area (EFFAREA) at the given energy and cosine theta values.
$ ftlist aeff_P6_v1_diff_back.fits C
HDU 2

  Col  Name             Format[Units](Range)      Comment
    1 ENERG_LO           60E [MeV]
    2 ENERG_HI           60E [MeV]
    3 CTHETA_LO          32E
    4 CTHETA_HI          32E
    5 EFFAREA            1920E [m2]

The part that’s most difficult to understand / remember is how the relevant information is encoded in the BINTABLE FITS header.

But note the HDUDOC  = 'CAL/GEN/92-019' key. If you Google CAL/GEN/92-019 you will find that it points to the OGIP effective area format document. document, which explains in detail what all the other keys mean.

There’s some software (e.g. fv) that understands this way of encoding n-dimensional arrays and axis information in FITS BINTABLEs.

$ ftlist aeff_P6_v1_diff_back.fits[1] K
XTENSION= 'BINTABLE'           / binary table extension
BITPIX  =                    8 / 8-bit bytes
NAXIS   =                    2 / 2-dimensional binary table
NAXIS1  =                 8416 / width of table in bytes
NAXIS2  =                    1 / number of rows in table
PCOUNT  =                    0 / size of special data area
GCOUNT  =                    1 / one data group (required keyword)
TFIELDS =                    5 / number of fields in each row
TTYPE1  = 'ENERG_LO'           /
TFORM1  = '60E     '
TTYPE2  = 'ENERG_HI'           /
TFORM2  = '60E     '
TTYPE3  = 'CTHETA_LO'          /
TFORM3  = '32E     '           /
TTYPE4  = 'CTHETA_HI'          /
TFORM4  = '32E     '           /
TTYPE5  = 'EFFAREA '           /
TFORM5  = '1920E   '
ORIGIN  = 'LISOC   '           / name of organization making this file
DATE    = '2008-05-06T08:56:19.9999' / file creation date (YYYY-MM-DDThh:mm:ss U
EXTNAME = 'EFFECTIVE AREA'     / name of this binary table extension
TUNIT1  = 'MeV     '           /
TUNIT2  = 'MeV     '           /
TUNIT3  = '        '
TUNIT4  = '        '
TUNIT5  = 'm2      '           /
TDIM5   = '(60, 32)'
TELESCOP= 'GLAST   '           /
INSTRUME= 'LAT     '           /
DETNAM  = 'BACK    '
HDUCLASS= 'OGIP    '           /
HDUDOC  = 'CAL/GEN/92-019'     /
HDUCLAS1= 'RESPONSE'           /
HDUCLAS2= 'EFF_AREA'           /
HDUVERS = '1.0.0   '           /
EARVERSN= '1992a   '           /
1CTYP5  = 'ENERGY  '           / Always use log(ENERGY) for interpolation
2CTYP5  = 'COSTHETA'           / Off-axis angle cosine
CREF5   = '(ENERG_LO:ENERG_HI,CTHETA_LO:CTHETA_HI)' /
CSYSNAME= 'XMA_POL '           /
CCLS0001= 'BCF     '           /
CDTP0001= 'DATA    '           /
CCNM0001= 'EFF_AREA'           /
CBD10001= 'VERSION(P6_v1_diff)'
CBD20001= 'CLASS(P6_v1_diff_back)'
CBD30001= 'ENERG(18-560000)MeV'
CBD40001= 'CTHETA(0.2-1)'
CBD50001= 'PHI(0-360)deg'
CBD60001= 'NONE    '
CBD70001= 'NONE    '
CBD80001= 'NONE    '
CBD90001= 'NONE    '
CVSD0001= '2007-01-17'         / Dataset validity start date (UTC)
CVST0001= '00:00:00'           /
CDES0001= 'GLAST LAT effective area' /
EXTVER  =                    1 / auto assigned by template parser
CHECKSUM= 'IpAMIo5LIoALIo5L'   / HDU checksum updated 2008-05-06T08:56:20
DATASUM = '340004495'          / data unit checksum updated 2008-05-06T08:56:20
END

HDU classes

Following NASA’s recommendation (see HFWG Recommendation R8), a hierarchical classification is applied to each HDU within DL3 FITS files, using the HDUCLASS and HDUCLASn keywords.

Some useful links from other projects:

The different HDUs defined in the current specifications are listed here:

  • EVENTS : Table containing the event lists. See EVENTS.
  • GTI : Table containing the Good Time Intervals (‘GTIs’) for the event list. See GTI.
  • POINTING : Table containing the pointing direction of the telescopes for a number of time stamps. See POINTING.
  • RESPONSE : Table containing any of the different instrument response function components defined in the specs. See IRFs.

The current HDU class scheme used is the following:

  • HDUCLASS : General identifier of the data format. Recommended value: “GADF” (for “gamma-astro-data-formats”)
  • HDUDOC : Link to the DL3 specifications documentation
  • HDUVERS : Version of the DL3 specification format
  • HDUCLAS1 : General type of HDU, currently: EVENTS, GTI or RESPONSE
  • HDUCLAS2 : In case of RESPONSE type, refers to the IRF components stored within the HDU: EFF_AREA, BKG, EDISP or RPSF
  • HDUCLAS3 : In case of RESPONSE type, refers to the way the IRF component was produced (POINT-LIKE or FULL-ENCLOSURE)
  • HDUCLAS4 : In case of RESPONSE type, refers to the name of the specific format
HDUCLAS1 HDUCLAS2 HDUCLAS3 HDUCLAS4
EVENTS      
GTI      
POINTING      
RESPONSE EFF_AREA POINT-LIKE AEFF_2D
    FULL-ENCLOSURE AEFF_2D
  EDISP POINT-LIKE EDISP_2D
    FULL-ENCLOSURE EDISP_2D
  RPSF FULL-ENCLOSURE PSF_TABLE
      PSF_3GAUSS
      PSF_KING
      PSF_GTPSF
  BKG POINT-LIKE BKG_2D
      BKG_3D
    FULL-ENCLOSURE BKG_2D
      BKG_3D
  RAD_MAX POINT-LIKE RAD_MAX_2D
INDEX OBS    
  HDU    

Notes

Here we collect miscellaneous notes that are helpful when reading or working with the specs.

FITS BINTABLE TFORM data type codes

The valid FITS BINTABLE TFORM data type codes are given in this table in the FITS standard paper in table 18

Information on how to use it correctly via CFITSIO is here

For astropy.io.fits, there’s these dicts to translate FITS BINTABLE TFORM codes to Numpy dtype codes:

>>> from astropy.io.fits.column import FITS2NUMPY, NUMPY2FITS
>>> FITS2NUMPY
{'J': 'i4', 'I': 'i2', 'L': 'i1', 'E': 'f4', 'M': 'c16', 'B': 'u1', 'K': 'i8', 'C': 'c8', 'D': 'f8', 'A': 'a'}
>>> NUMPY2FITS
{'i1': 'L', 'c16': 'M', 'i4': 'J', 'f2': 'E', 'i2': 'I', 'b1': 'L', 'i8': 'K', 'u8': 'K', 'u1': 'B', 'u4': 'J', 'u2': 'I', 'c8': 'C', 'f8': 'D', 'f4': 'E', 'a': 'A'}

But normally you never should have to manually handle these dtypes from Python. astropy.io.fits or astropy.table.Table will read and write the TFORM FITS header keys correctly for you.

Glossary

FITS

Flexible Image Transport System http://fits.gsfc.nasa.gov/

HEASARC

High Energy Astrophysics Science Archive Research Centre. http://heasarc.gsfc.nasa.gov/

OGIP FITS Standards

The FITS Working Group in the Office of Guest Investigators Program has established conventions for FITS files for high-energy astrophysics projects. http://hesperia.gsfc.nasa.gov/rhessidatacenter/software/ogip/ogip.html

CALDB

The HEASARC’s calibration database (CALDB) system stores and indexes datasets associated with the calibration of high energy astronomical instrumentation. http://heasarc.gsfc.nasa.gov/docs/heasarc/caldb/caldb_intro.html

IACT

IACT = imaging atmospheric Cherenkov telescope (see wikipedia article).

Observation = Run

For IACTs observations are usually conducted by pointing the array (or a sub-array) for a period of time (typically half an hour for current IACTs) at a fixed location in celestial coordinates (i.e. the telescopes slew in horizontal Alt/Az coordinates to keep the pointing position RA/DEC in the center of the field of view).

For current IACTs the term “run” is more common than “observation”, but for CTA probably the term “observation” will be used. So it’s recommended to use observation in these format specs.

Off Observation

The term “off observation” or “off run” refers to observations where most of the field of view contains no gamma-ray emission (apart from a possible diffuse extragalactic isotropic component, which is supposed to be very weak at TeV energies).

AGN observations are sometimes also considered “off observations”, because the fraction of the field of view containing their gamma-ray emission is often very small, and most of the field of view is empty.

For further info on background modeling see Berge (2007)

Events

This document describes the format to store DL3 event data for gamma-ray instruments.

The main table is EVENTS, which at the moment contains not only event parameters, but also key information about the observation needed to analyse the data such as the pointing position or the livetime.

The GTI table gives the “good time intervals”. The EVENTS table should match the GTI table, i.e. contain the relevant events within these time intervals.

No requirement is stated yet whether event times must be sorted chronologically; this is under discussion and might be added in a future version of the spec; sorting events by time now when producing EVENTS tables is highly recommended.

The POINTING table defines the pointing position at given times, allowing to interpolate the pointing position at any time. It currently isn’t in use yet, science tools access the pointing position from the EVENTS header and only support fixed-pointing observations.

We note that it is likely that the EVENTS, GTI, and POINTING will change significantly in the future. Discusssion on observing modes is ongoing, a new HDU might be introduced that stores the “observation” (sometimes called “tech”) information, like e.g. the observation mode and pointing information. Other major decision like whether there will be on set of IRFs per OBS_ID (like we have now), or per GTI, is being discussed in CTA. Another discussion point is how to handle trigger dead times. Currently science tools have to access DEADC or LIVETIME from the event header, and combine that with GTI if they want to analyse parts of observations. One option could be to absorb the dead-time correction into the effective areas, another option could be to add dead-time correction factors to GTI tables.

EVENTS

The EVENTS extension is a binary FITS table that contains an event list. Each row of the table provides information that characterises one event. The mandatory and optional columns of the table are listed below. In addition, a list of header keywords providing metadata is specified. Also here there are mandatory and optional keywords. The recommended extension name of the binary table is EVENTS.

Mandatory columns

We follow the OGIP event list standard.

  • EVENT_ID type: int64
    • Event identification number at the DL3 level. See notes on EVENT_ID below.
  • TIME type: float64, unit: s
    • Event time (see Time)
  • RA type: float, unit: deg
    • Reconstructed event Right Ascension (see RA / DEC).
  • DEC type: float, unit: deg
    • Reconstructed event Declination (see RA / DEC).
  • ENERGY type: float, unit: TeV
    • Reconstructed event energy.
Optional columns

Note

None of the following columns is required to be part of an EVENTS extension. Any software using these columns should first check whether the columns exist, and warn in case of their absence. Any software ignoring these columns should make sure that their presence does not deteriorate the functioning of the software.

  • EVENT_TYPE type: bit field (in FITS tform=32X)
    • Event quality partition.
  • MULTIP type: int
    • Telescope multiplicity. Number of telescopes that have seen the event.
  • GLON type: float, unit: deg
    • Reconstructed event Galactic longitude (see Galactic).
  • GLAT type: float, unit: deg
    • Reconstructed event Galactic latitude (see Galactic).
  • ALT type: float, unit: deg
  • AZ type: float, unit: deg
  • DETX type: float, unit: deg
  • DETY type: float, unit: deg
  • THETA type: float, unit: deg
  • PHI type: float, unit: deg
    • Reconstructed field of view position angle (see Field of view).
  • GAMMANESS type: float
    • Classification score of a signal / background separation. SHOULD be between 0 and 1, with higher values indicating larger confidence that the event was produced by a gamma ray.
  • DIR_ERR type: float, unit: deg
    • Direction error of reconstruction
  • ENERGY_ERR type: float, unit: TeV
    • Error on reconstructed event energy
  • COREX type: float, unit: m
    • Reconstructed core position X of shower
  • COREY type: float, unit: m
    • Reconstructed core position Y of shower
  • CORE_ERR type: float, unit: m
    • Error on reconstructed core position of shower
  • XMAX type: float, unit: radiation lengths
    • First interaction depth
  • XMAX_ERR type: float, unit: radiation lengths
    • Error on first interaction depth
  • HIL_MSW type: float
    • Hillas mean scaled width
  • HIL_MSW_ERR type: float
    • Hillas mean scaled width error
  • HIL_MSL type: float
    • Hillas mean scaled length
  • HIL_MSL_ERR type: float
    • Hillas mean scaled length error
Mandatory header keywords

The standard FITS reference time header keywords should be used (see Formats). An observatory Earth location should be given as well (see Earth location).

  • HDUCLASS type: string
    • Signal conformance with HEASARC/OGIP conventions (option: ‘OGIP’). See HDU classes.
  • HDUDOC type: string
    • Reference to documentation where data format is documented. See HDU classes.
  • HDUVERS type: string
    • Version of the format (e.g. ‘1.0.0’). See HDU classes.
  • HDUCLAS1 type: string
    • Primary extension class (option: ‘EVENTS’). See HDU classes.
  • OBS_ID type: int
    • Unique observation identifier (Run number)
  • TSTART type: float, unit: s
    • Start time of observation (relative to reference time, see Time)
  • TSTOP type: float, unit: s
    • End time of observation (relative to reference time, see Time)
  • ONTIME type: float, unit: s
    • Total good time (sum of length of all Good Time Intervals). If a Good Time Interval (GTI) table is provided, ONTIME should be calculated as the sum of those intervals. Corrections for instrumental dead time effects are NOT included.
  • LIVETIME type: float, unit: s
    • Total time (in seconds) on source, corrected for the total instrumental dead time effect.
  • DEADC type: float
    • Dead time correction, defined by LIVETIME/ONTIME. Is comprised in [0,1]. Defined to be 0 if ONTIME=0.
  • OBS_MODE type: string
    • Observation mode. See notes on OBS_MODE below.
  • RA_PNT type: float, unit: deg
    • Pointing Right Ascension (see RA / DEC). Not mandatory if OBS_MODE=DRIFT, but average values could optionally be provided.
  • DEC_PNT type: float, unit: deg
    • Pointing declination (see RA / DEC). Not mandatory if OBS_MODE=DRIFT, but average values could optionally be provided.
  • ALT_PNT type: float, unit: deg
    • Pointing Altitude (see Alt / Az). Only mandatory if OBS_MODE=DRIFT
  • AZ_PNT type: float, unit: deg
    • Pointing azimuth (see Alt / Az). Only mandatory if OBS_MODE=DRIFT
  • EQUINOX type: float
    • Equinox in years for the celestial coordinate system in which positions given in either the header or data are expressed (options: 2000.0). See also HFWG Recommendation R3 for the OGIP standard.
  • RADECSYS type: string
    • Stellar reference frame used for the celestial coordinate system in which positions given in either the header or data are expressed. (options: ‘ICRS’, ‘FK5’). See also HFWG Recommendation R3 for the OGIP standard.
  • ORIGIN type: string
    • Organisation that created the FITS file. This can be the same as TELESCOP (e.g. “HESS”), but it could also be different if an organisation has multiple telescopes (e.g. “NASA” or “ESO”).
  • TELESCOP type: string
    • Telescope (e.g. ‘CTA’, ‘HESS’, ‘VERITAS’, ‘MAGIC’, ‘FACT’)
  • INSTRUME type: string
    • Instrument used to aquire the data contained in the file. Each organisation and telescop has to define this. E.g. for CTA it could be ‘North’ and ‘South’, or sub-array configurations, this has not been defined yet.
  • CREATOR type: string
    • Software that created the file. When appropriate, the value of the CREATOR keyword should also reference the specific version of the program that created the FITS file. It is intented that this keyword should refer to the program that originally defined the FITS file structure and wrote the contents. If a FITS file is subsequently copied largely intact into a new FITS by another program, then the value of the CREATOR keyword should still refer to the original program. HISTORY keywords should be used instead to document any further processing that is performed on the file after it is created. For more reading on the OGIP standard, see here.
Optional header keywords
  • OBSERVER type: string
    • Name of observer. This could be for example the PI of a proposal.
  • CREATED type: string
    • Time when file was created in ISO standard date representation “ccyy-mm-ddThh:mm:ss” (UTC)
  • OBJECT type: string
    • Observed object (e.g. ‘Crab’)
  • RA_OBJ type: float, unit: deg
    • Right ascension of OBJECT
  • DEC_OBJ type: float, unit: deg
    • Declination of OBJECT
  • EV_CLASS type: str
  • TELAPSE type: float, unit: s
    • Time interval between start and stop time (TELAPSE=TSTOP-TSTART). Any gaps due to bad weather, or high background counts and/or other anomalies, are included.

Warning

Keywords below seem to be pretty low-level and eventually instrument specific. It needs to be discussed whether a recommendation on these keywords should be made, or whether the definition should be left to the respective consortia.

  • HDUCLAS2 type: string
    • Secondary extension class (option: ‘ACCEPTED’). See HDU classes.
  • TELLIST type: string
    • Telescope IDs in observation (e.g. ‘1,2,3,4’)
  • N_TELS type: int
    • Number of observing telescopes
  • TASSIGN type: string
    • Place of time reference (‘Namibia’)
  • DST_VER type: string
    • Version of DST/Data production
  • ANA_VER type: string
    • Reconstruction software version
  • CAL_VER type: string
    • Calibration software version
  • CONV_DEP type: float
    • convergence depth (0 for parallel pointing)
  • CONV_RA type: float, unit: deg
    • Convergence Right Ascension
  • CONV_DEC type: float, unit: deg
    • Convergence Declination
  • TRGRATE type: float, unit: Hz
    • Mean system trigger rate
  • ZTRGRATE type: float, unit: Hz
    • Zenith equivalent mean system trigger rate
  • MUONEFF type: float
    • Mean muon efficiency
  • BROKPIX type: float
    • Fraction of broken pixels (0.15 means 15% broken pixels)
  • AIRTEMP type: float, unit: deg C
    • Mean air temperature at ground during the observation.
  • PRESSURE type: float, unit: hPa
    • Mean air pressure at ground during the observation.
  • RELHUM type: float
    • Relative humidity
  • NSBLEVEL type: float, unit: a.u.
    • Measure for night sky background level
Notes

This paragraph contains some explanatory notes on some of the columns and header keys mentioned above.

EVENT_ID

Most analyses with high-level science tools don’t need EVENT_ID information. But being able to uniquely identify every event is important, e.g. when comparing the high-level reconstructed event parameters (RA, DEC, ENERGY) for different calibrations, reconstructions or gamma-hadron separations.

Assigning a unique EVENT_ID during data taking can be difficult or impossible. E.g. in H.E.S.S. we have two numbers BUNCH_ID_HESS and EVENT_ID_HESS that only together uniquely identify an event within a given run (i.e. OBS_ID). Probably the scheme to uniquely identify events at the DL0 level for CTA will be even more complicated, because of the much larger number of telescopes and events.

So given that data taking and event identification is different for every IACT at low data levels and is already fixed for existing IACTs, we propose here to have an EVENT_ID that is simpler and works the same for all IACTs at the DL3 level.

As an example: for H.E.S.S. we achieve this by using an INT64 for EVENT_ID and to store EVENT_ID = (BUNCH_ID_HESS << 32) | (EVENT_ID_HESS), i.e. use the upper bits to contain the low-level bunch ID and the lower bits to contains the low-level event ID. This encoding is unique and reversible, i.e. it’s easy to go back to BUNCH_ID_HESS and EVENT_ID_HESS for a given EVENT_ID, and to low-level checks (e.g. look at the shower images for a given event that behaves strangely in reconstructed high-level parameters).

EV_CLASS and EVENT_TYPE

Currently in this format specification, event class EV_CLASS is a header key (i.e. the same for all events in a given event list) and EVENT_TYPE is a bitfield column (i.e. can have a different value for each event). Both are optional at this time, only used as provenance information, not by science tools to make any decisions how to analyse the data.

The reason for this is simply that we have not agreed yet on a scheme what event class and event type means, and how it should be used by science tools for analysis. Developing this will be one of the major topics for the next version of the spec. It is likely that a proper definition of event classes and types will not be compatible with what is currently defined here, so not filling EV_CLASS and EVENT_TYPE when creating DL3 data is not a bad idea.

To summarise a bit the discussions on this important point in the past years, they were mostly done by looking at what Fermi-LAT is doing and some prototyping in H.E.S.S. to export DL3 data to FITS.

The scheme in Fermi-LAT for event classes and event types is nicely summarised here or here. There event classes and types are key parts of the data model, used for EVENT to IRF association and even end users need to learn about them and pass event class and type information when using science tools.

One option could be to mostly adopt what Fermi-LAT does for IACTs. However, a major difference is that Fermi-LAT is a more stable detector, that has a CALDB of a very limited number or IRFs that can be used for all data, whereas for IACTs with changing telescope configurations, degrading mirrors, changing atmosphere, zenith angle, … most likely we will have to produce per-observation IRFs, and then it’s easier to bundle the IRFs with the EVENTS in one file, and the use of event class and type to link EVENTS and IRFs is no longer needed.

An alternative scheme is to use the term “event class” to describe a given analysis configuration. This is e.g. how we currently use EV_CLASS in H.E.S.S., we fill values like “standard” or “hard” or “loose” to describe a given full configuration that is the result of a calibration, event reconstruction and gamma-hadron separation pipeline. This is similar to what Fermi-LAT does, except less sophisticated, the event classes are completely independent, there is no nesting, and separate events and IRF files are produced for each class/configuration. To analyse data from a given class, the user chooses which set of files to download (e.g. “loose” for pulsars or “hard” for a bright source where a good PSF is needed), and then the science tools don’t need to do anything with EV_CLASS, it is just provenance information. Some people are experimenting with the use of EVENT_TYPE in a similar way as Fermi-LAT, e.g. to have event quality partitioning based on number of telescopes that saw a given event, or other criteria. Again, it is left to users to split events by EVENT_TYPE and produce IRFs for each event type and pass those for a joint fit to the science tools, as there is no agreement or implementation yet in the science tools to support EVENT_TYPE directly.

So to conclude and summarise again: EV_CLASS and EVENT_TYPE as mentioned here in this spec are optional and very preliminary. Defining event class and type for IACTs needs more prototyping by the science tools and current IACTs and CTA and discussion, and then a proposal for a specification.

OBS_MODE

The keyword OBS_MODE specifies the mode of the observation. This is used in cases where the instrument or detector can be configured to operate in different modes which significantly affect the resulting data, or to accomodate different types of instrument. More details can be found here

Most IACT data will be POINTING, indicating a constant pointing in equatorial coordinates.

In addition to the OGIP-defined values (POINTING, RASTER, SLEW and SCAN), we define the option DRIFT to accomodate ground-based wide-field instruments, in which local zenith/azimuth coordinates remain constant. In this case, the header keywords RA_PNT and DEC_PNT are no longer mandatory, and instead ALT_PNT and AZ_PNT are required.

It is likely that OBS_MODE in the future will be a key piece of information in the DL3 data model, defining the observation mode (e.g. pointed, divergent, slewing, …) and being required to analyse the data correctly.

GTI

The GTI extension is a binary FITS table that contains the Good Time Intervals (‘GTIs’) for the event list. A general description of GTIs can be found in the OGIP GTI standard.

This HDU contains two mandatory columns named START and STOP. At least one row is containing the start and end time of the observation must be present. The values are in units of seconds with respect to the reference time defined in the header (keywords MJDREFI and MJDREFF). This extension allows for a detailed handling of good time intervals (i.e. excluding periods with cloud cover or lightning during one observation).

High-level Science tools could modify the GTIs according to user parameter. See e.g. gtmktime for an application example from the Fermi Science Tools.

Mandatory columns
  • START type: float64, unit: s
    • Start time of good time interval (see Time)
  • STOP type: float64, unit: s
    • End time of good time interval (see Time)
Mandatory header keywords

The standard FITS reference time header keywords should be used (see Formats).

Warning

This is a first draft proposal of a pointing table. It is not being used yet by data producers or science tools. Please note that the format is likely subject to change.

POINTING

The POINTING extension is a binary FITS table that contains for a number of time stamps the pointing direction of the telescopes. A pointing is here defined as the centre of the field of view (or centre of the camera coordinates). In reality, all telescopes may point to different positions (for example for divergent pointing mode). The main purpose of the POINTING extension is to provide time dependent information on how to transform between celestial and terrestial coordinates.

See also HFWG Recommendation R3 for the OGIP standard.

Mandatory columns
  • TIME type: float64, unit: s
    • Pointing time (see Time)
  • RA_PNT type: float, unit: deg
    • Pointing Right Ascension (see RA / DEC).
  • DEC_PNT type: float, unit: deg
Optional columns
  • ALT_PNT type: float, unit: deg
  • AZ_PNT type: float, unit: deg
Mandatory header keywords

The standard FITS reference time header keywords should be used (see Formats). An observatory Earth location should be given as well (see Earth location).

IRFs

The instrument response function (IRF) format currently in use for gamma-ray instruments and, in particular, imaging atmospheric Cherenkov telescopes (IACTs) are stored in FITS binary tables using the “multidimensional array” convention (binary tables with a single row and array columns) described at BINTABLE HDU. This format has been used for calibration data and IRF of X-ray instruments, as well as for the IRFs that are distributed with the Fermi-LAT science tools.

Two different approaches are used to store the IRF of gamma-ray instruments:

  • Full-enclosure IRF: all IRF components are stored as a function of the offset with respect to the source position.
  • Point-like IRF: IRF components are calculated after applying a cut in direction offset. This format has been used by the current generation of IACTs to perform spectral analysis and light curves.

At the moment (November 2015), this format is used by H.E.S.S., MAGIC and VERITAS and supported by Gammapy and Gammalib and is being proposed for DL3 IRF (i.e. the format distributed to end users and used by the science tools for CTA).

IRF components

The IRF is made up of several components, described here:

Effective Area

Effective area combines the detection efficiency of an instrument with the observable area. It can be interpreted as the area a perfect detector directly measuring gamma rays would have.

\[A_{\mathrm{eff}}(E, ...) = p(E, ...) \cdot A,\]

with \(A\) the total observable area and \(p(E, ...)\) the detection probability for a gamma ray of a given energy and possibly other parameters, like the position in the field of view.

Effective area has to be calculated from simulated events in a discretized manner. \(p\) is estimated in bins of the dependend variables by dividing the number of detected and reconstructed events by the number of all simulated events. Effective are in bin \(i\) is then

\[A_{\mathrm{eff}, i} = \frac{N_{\mathrm{detected}, i}}{N_{\mathrm{simulated}, i}} \cdot A,\]

Usually, \(A\) is the area in which the events were simulated, for CORSIKA simulations this will be \(A = \pi R_{\mathrm{max}}^2\), with \(R_{\mathrm{max}}\) being the maximum scatter radius.

Calculation of effective area can be done at several analysis steps, e.&thinsp;g. after trigger, after image cleaning or most important for the DL3 event lists, after applying all event selection criteria, including gamma-hadron separation and for the case of point-like IRFs the \(\theta^2\) cut.

As in general, there is no reconstructed energy for the undetected events, effective area can only be expressed in bins of the true simulated energy.

The proposed effective area format follows mostly the OGIP effective area format document.

For the moment, the format for the effective area works to a satisfactory level. Nevertheless, for instance the energy threshold variation across the FoV is not taken into account. However, since the threshold definitions are currently non-unified an inclusion of this variation is still arbitrary and subject to analysis chain. In addition, this feature is currently not supported in current open source tools. We therefore keep the optional opportunity to add an individual extension listing the energy threshold varying across the FoV. This will likely be included in future releases.

Energy Dispersion

The energy dispersion information is stored in a FITS file with one required extensions (HDU). The stored quantity is \(\frac{dP}{d\mu}\), a PDF for the energy migration

\[\mu = \frac{E_{\mathrm{reco}}}{E_{\mathrm{true}}}\]

as a function of true energy and offset. It should be normalized to unity, i.e.,

\[\int_0^\infty \frac{dP}{d\mu}\,d\mu = 1\,.\]

The migration range covered in the file must be large enough to make this possible (Suggestion: \(0.2 < \mu < 5\))

Transformation

For the purpose of some analysis, for example when extracting an RMF, it is necessary to calculate the detector response \(R(I,J)\), i.e. the probability to find an energy from within a given true energy bin I of width \(\Delta E_{\mathrm{true}}\) within a certain reconstructed energy bin J of width \(\Delta E_{\mathrm{reco}}\). In order to do so, the following integration has to be performed (for a fixed offset).

\[R(I,J) = \frac{ \int_{\Delta E_{\mathrm{true}}} R(I,E_{\mathrm{true}})\ d E_{\mathrm{true}}}{\Delta E_{\mathrm{true}}},\]

where

\[R(I,E_{\mathrm{true}}) = \int_{\mu(\Delta E_{\mathrm{reco}})} \mathrm{PDF}(E_{\mathrm{true}}, \mu)\ d \mu\]

is the probability to find a given true energy \(E_{\mathrm{true}}\) in the reconstructed energy band J.

Point spread function
Introduction

The point spread function (PSF) (Wikipedia - PSF) represents the spatial probability distribution of reconstructed event positions for a point source. So far we’re only considering radially symmetric PSFs here.

Probability distributions
  • \(dP/d\Omega(r)\), where \(dP\) is the probability to find an event in a solid angle \(d\Omega\) at an offset \(r\) from the point source. This is the canonical form we use and the values we store in files.
  • Often, when comparing observered event distributions with a PSF model, the \(dP/dr^2\) distributions in equal-width bins in \(r^2\) is used. The relation is \(d\Omega = \pi dr^2\), i.e. \(dP/d\Omega=(1/\pi)(dP/dr^2)\).
  • Sometimes, the distribution \(dP/dr(r)\) is used. The relation is \(dP/dr = 2 \pi r dP/d\Omega\).

TODO: explain “encircled energy” = “encircled counts” = “cumulative” representation of PSF and define containment fraction and containment radius.

Normalisation

PSFs must be normalised to integrate to total probability 1, i.e.

\[\int dP/d\Omega(r) d\Omega = 1\,.\]

This implies that the PSF producer is responsible for choosing the Theta range and normalising. I.e. it’s OK to choose a theta range that contains only 95% of the PSF, and then the integral will be 0.95.

We recommend everyone store PSFs so that truncation is completely negligible, i.e. the containment should be 99% or better for all of parameter space.

Comments
  • Usually the PSF is derived from Monte Carlo simulations, but in principle it can be estimated from bright point sources (AGN) as well.
  • Tools should assume the PSF is well-sampled and noise-free. I.e. if limited event statistics in the PSF computation is an issue, it is up to the PSF producer to denoise it to an acceptable level.
Background

One method of background modeling for IACTs is to construct spatial and / or spectral model templates of the irreducible cosmic ray background for a given reconstruction and gamma-hadron separation from Off Observation. These templates can then be used as an ingredient to model the background in observations that contain gamma-ray emission of interest, or to compute the sensitivity for that set of cuts.

Note

Generating background models requires the construction of several intermediate products (counts and livetime histograms, both filled by cutting out exclusion regions around sources like AGN) to arrive at the models containing an absolute rate described here. At this time we don’t specify a format for those intermediate formats.

Note

Background models are sometimes considered an instrument response function (IRF) and sometimes not (e.g. when the background is estimated from different parts of the field of view for the same observation).

Here we have the background format specifications listed under IRFs, simply because the storage format is very similar to the other IRFs (e.g. effective area) and we didn’t want to introduce a new top-level section besides IRFs.

IRF axes

Most IRFs are dependent on parameters, and the 1-dimensional parameter arrays are stored in columns. The following names are recommended:

  • For energy grids, see here for basic recommendations. Column names should be ENERGY or ENERG_LO, ENERG_HI because that is used (consistently I think) for OGIP and Fermi-LAT. For separate HDUs, the extension names should be ENERGIES or EBOUNDS (used by Fermi-LAT consistently).
  • Sky coordinates should be called RA, DEC, GLON, GLAT, ALT, AZ (see Coordinates)
  • Field of view coordinates DETX, DETY or THETA, PHI for offset and position angle in the field of view (see Field of view).
  • Offset wrt. the source position should be called RAD (this is what the OGIP PSF formats use).

In the specific case of point-like IRFs:

  • The energy-dependent radius of the selected region of interest should be RAD_MAX

The IRF format specifications mention a recommended axis format and axis units. But tools should not depend on this and instead:

  • Use the axis order specified by the CREF header keyword (see BINTABLE HDU)
  • Use the axis unit specified by the CUNIT header keywords (see BINTABLE HDU)

Full-enclosure IRFs

Full-enclosure IRF format has been used for calibration data and IRF of X-ray instruments, as well as for the IRFs that are distributed with the Fermi-LAT science tools.

Any full-enclosure IRF component should contain the header keyword:

  • HDUCLAS3 = FULL-ENCLOSURE

From here on, the specific format of each IRF component:

Effective area format

Here we specify the format to store the effective area of a full-enclosure IRF. Effective area is always stored as a function of true energy. (see Effective Area)

AEFF_2D
Effective Area vs true energy

The effective area as a function of the true energy and offset angle is saved as a BINTABLE HDU with required columns listed below.

Columns:

  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • True energy axis
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
    • Field of view offset axis
  • EFFAREA – ndim: 2, unit: m^2
    • Effective area value as a function of true energy

Recommended axis order: ENERGY, THETA

Header keywords:

If the IRFs are only known to be “valid” or “safe” to use within a given energy range, that range can be given via the following two keywords. The keywords are optional, not all telescopes use the concept of a safe range; e.g. in CTA at this time this hasn’t been defined. Note that a proper scheme to declare IRF validity range (e.g. masks or weights, or safe cuts that depend on other parameters such as FOV offset) is not available yet.

  • LO_THRES type: float, unit: TeV
    • Low energy threshold
  • HI_THRES type: float, unit: TeV
    • High energy threshold

If the effective area corresponds to a given observation with an OBS_ID, that OBS_ID should be given as a header keyword. Note that this is not always the case, e.g. sometimes IRFs are simulated and produced for instruments that haven’t even been built yet, and then used to simulate different kinds of observations.

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

The recommended EXTNAME keyword is “EFFECTIVE AREA”.

Example data file: here.

Energy dispersion format

The format to store full-enclosure energy dispersion (see Energy Dispersion) is the following:

EDISP_2D

The energy dispersion information is saved as a BINTABLE HDU with the following required columns.

Columns:

  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • True energy axis
  • MIGRA_LO, MIGRA_HI – ndim: 1, unit: dimensionless
    • Energy migration axis (defined above)
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
  • MATRIX – ndim: 3, unit: dimensionless

Recommended axis order: ENERGY, MIGRA, THETA

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: TODO

PSF format

IACTs PSF does not always have a Gaussian shape. Its shape is highly dependent on the analysis and each specific instrument. For this reason, several parameterizations are allowed to store the PSF:

PSF_TABLE

This is a PSF FITS format we agree on for IACTs. This file contains the offset- and energy-dependent table distribution of the PSF.

This format is almost identical to the OGIP radial PSF format. The differences are that we don’t have the dependency on azimuthal field of view position, the units are different and the recommended axis order is different (to have uniformity across axis order in the IACT DL3 IRFs).

Columns:

  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • True energy axis
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
  • RAD_LO, RAD_HI – ndim: 1, unit: deg
    • Offset angle from source position
  • RPSF – ndim: 3, unit: sr^-1

Recommended axis order: ENERGY, THETA, RAD.

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: TODO

PSF_3GAUSS

Multi-Gauss mixture models are a common way to model distributions (for source intensity profiles, PSFs, anything really), see e.g. 2013PASP..125..719H. For H.E.S.S., radial PSFs have been modeled as 1, 2 or 3 two-dimensional Gaussians \(dP/d\Omega\).

Note

A two-dimensional Gaussian distribution \(dP/d\Omega = dP/(dx dy)\) is equivalent to an exponential distribution in \(dP/x\), where \(x=r^2\) and a Rayleigh distribution in \(dP/dr\).

In this format, the triple-Gauss distribution is parameterised as follows:

\[dP/d\Omega(r, S, \sigma_1, A_2, \sigma_2, A_3, \sigma_3) = \frac{S}{\pi} \left[ \exp\left(-\frac{r^2}{2\sigma_1^2}\right) + A_2 \exp\left(-\frac{r^2}{2\sigma_2^2}\right) + A_3 \exp\left(-\frac{r^2}{2\sigma_3^2}\right) \right],\]

where \(S\) is SCALE, \(\sigma_i\) is SIGMA_i and \(A_i\) is AMPL_i (see columns listed below).

TODO: give analytical formula for the integral, so that it’s easy to check if the PSF is normalised for a given set of parameters.

TODO: give test case value and Python function for easy checking?

Note

By setting the amplitudes of the 3rd (and 2nd) Gaussians to 0 one can implement double (or single) Gaussian models as well.

Columns:

  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • True energy axis
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
  • SCALE – ndim: 2, unit: sr^(-1)
    • Absolute scale of the 1st Gaussian
  • SIGMA_1, SIGMA_2, SIGMA_3 – ndim: 2, unit: deg
    • Model parameter (see formula above)
  • AMPL_2, AMPL_3 – ndim: 2, unit: none
    • Model parameter (see formula above)

Recommended axis order: ENERGY, THETA

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: TODO

PSF_KING

The King function parametrisations for PSFs has been in use in astronomy as an analytical PSF model for many instruments, for example by the Fermi-LAT (see 2013ApJ…765…54A).

The distribution has two parameters GAMMA \(\gamma\) and SIGMA \(\sigma\) and is given by the following formula:

\[dP/d\Omega(r,\sigma,\gamma) = \frac{1}{2\pi\sigma^2} \left(1-\frac{1}{\gamma}\right) \left(1+\frac{r^2}{2\gamma\sigma^2}\right)^{- \gamma}\]

This formula integrates to 1 (see Introduction).

Columns:

  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • True energy axis
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
  • GAMMA – ndim: 2, unit: none
    • Model parameter (see formula above)
  • SIGMA – ndim: 2, unit: deg
    • Model parameter (see formula above)

Recommended axis order: ENERGY, THETA

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: TODO

PSF_GTPSF

The FITS file has the following BinTable HDUs / columns:

  • PSF HDU
    • Energy – 1D (MeV)
    • Exposure – 1D (cm^2 s)
    • Psf – 2D (sr^-1), shape = (len(Energy) x len(Theta)) Point spread function value \(dP/d\Omega\), see Probability distributions.
  • THETA HDU
    • Theta – 1D (deg)

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: psf-fermi.fits

Background format

Here we specify two formats for the background template models (see Background) of a full-enclosure IRF:

  • BKG_2D models depend on ENERGY and THETA, i.e. are radially symmetric.
  • BKG_3D models depend on ENERGY and field of view coordinates DETX and DETY.
BKG_2D

The BKG_2D format contains a 2-dimensional array of post-select background rate, stored in the BINTABLE HDU format.

Required columns:
  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • Reconstructed energy axis
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
  • BKG – ndim: 2, unit: s^-1 MeV^-1 sr^-1
    • Absolute post-select background rate (expected background per time, energy and solid angle).

Recommended axis order: ENERGY, THETA

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: here.

BKG_3D

The BKG_3D format contains a 3-dimensional array of post-select background rate, stored in the BINTABLE HDU format.

Required columns:
  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • Reconstructed energy axis
  • DETX_LO, DETX_HI, DETY_LO, DETY_HI – ndim: 1, unit: deg
  • BKG – ndim: 3, unit: s^-1 MeV^-1 sr^-1
    • Absolute post-select background rate (expected background per time, energy and solid angle).

Recommended axis order: ENERGY, DETX, DETY

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Further header keywords:

  • FOVALIGN = ‘ALTAZ’ / ‘RADEC’
    • Alignment of the field-of-view coordinate system (see Field of view)

Example data file: here.

Notes

The background rate is not a “flux” or “surface brightness”. It is already a count rate, it doesn’t need to be multiplied with effective area to obtain predicted counts, like gamma-ray flux and surface brightness models do. The rate is computed per observation time (without any dead-time correction, don’t use livetime when computing or using the background rate).

Point-like IRFs

Point-like IRFs has been classically used within the IACT community. Each IRF component is calculated from the events surviving an energy dependent directional cut around the assumed source position.

The format of each point-like IRF component is analog to the ones already described within the full enclosure IRF specifications (see Full-enclosure IRFs), with certain differences listed in this section.

Any point-like IRF component should contain the header keyword:

  • HDUCLAS3 = POINT-LIKE
RAD_MAX

In addition to the IRFs, the actual directional cut applied to the data needs to be stored. This cut is allowed to be constant or variable along several axes, with a different format.

In case the angular cut is constant along the energy and FoV, an additional header keyword may be added to the IRF HDU:

Header keyword:

  • RAD_MAX type: float, unit: deg
    • Radius of the directional cut applied to calculate the IRF, in degrees.

If this keyword is present, the science tools should assume the directional cut of this point-like IRF is constant over all axes. In case the angular cut is variable along any axis (reconstructed energy or FoV), an additional HDU is required to store these values. Note any DL3 file with a point-like IRF (with HDUCLAS3 = POINT-LIKE) that has no RAD_MAX keyword within the HDU metadata should have this additional HDU.

In case the directional cut is variable with energy or the FoV, point-like IRFs require an additional binary table. It stores the values of RAD_MAX as a function of the reconstructed energy and FoV following the BINTABLE HDU format.

RAD_MAX_2D

The RAD_MAX_2D format contains a 2-dimensional array of directional cut values, stored in the BINTABLE HDU format.

Required columns:

  • ENERG_LO, ENERG_HI – ndim: 1, unit: TeV
    • Reconstructed energy axis
  • THETA_LO, THETA_HI – ndim: 1, unit: deg
    • Field of view offset axis
  • RAD_MAX – ndim: 2, unit: deg
    • Radius of the directional cut applied to calculate the IRF, in degrees.

Recommended axis order: ENERGY, THETA, RAD_MAX

Header keywords:

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Example data file: here.

Data storage

At the moment there is no agreed way to organise IACT data, and how to connect EVENTS with IRFs or other information such as time or pointing information that is needed for analysis.

Here we document one scheme that is used extensively in H.E.S.S., and partly also by other IACTs. We expect that it will be superceded in the future by a different scheme developed by CTA.

The basic idea is that current IACT data consists of “runs” or “observations” with a given OBS_ID, and that for each observation there is one EVENTS and several IRF FITS HDUs that contain everything needed to analyse that data.

A second idea is that with H.E.S.S. we export all data to FITS, so we have many 1000s of observations and users usually will need to do a run selection e.g. by sky position or observation time, and they want to do that in an efficient way that doesn’t require globbing for 1000s of files and opening up the FITS headers to find out what data is present.

There are two index tables:

Observation index table

The observation index table is stored in a FITS file as a BINTABLE HDU:

  • Suggested filename: obs-index.fits.gz
  • Suggested HDU name: OBS_INDEX

It contains one row per observation (a.k.a. run) and lists parameters that are commonly used for observation selection, grouping and analysis.

Required columns
  • OBS_ID type: int
    • Unique observation identifier (Run number)
  • OBS_MODE type: string
    • Observation mode. See notes on OBS_MODE.
  • RA_PNT type: float, unit: deg
    • Pointing Right Ascension (see RA / DEC). Not mandatory if OBS_MODE=DRIFT, but average values could optionally be provided.
  • DEC_PNT type: float, unit: deg
    • Pointing declination (see RA / DEC). Not mandatory if OBS_MODE=DRIFT, but average values could optionally be provided.
  • ALT_PNT type: float, unit: deg
    • Pointing Altitude (see Alt / Az). Only mandatory if OBS_MODE=DRIFT
  • AZ_PNT type: float, unit: deg
    • Pointing azimuth (see Alt / Az). Only mandatory if OBS_MODE=DRIFT
  • TSTART type: float, unit: s
    • Start time of observation relative to the reference time (see Time)
  • TSTOP type: float, unit: s
    • End time of observation relative to the reference time (see Time)
Optional columns

The following columns are optional. They are sometimes used for observation selection or data quality checks or analysis, but aren’t needed for most users.

  • ZEN_PNT type: float, unit: deg
    • Observation pointing zenith angle at observation mid-time TMID (see Alt / Az)
  • ONTIME type: float, unit: s
    • Total observation time (including dead time).
    • Equals TSTOP - TSTART
  • LIVETIME type: float, unit: s
    • Total livetime (observation minus dead time)
  • DEADC type: float
    • Dead time correction.
    • It is defined such that LIVETIME = DEADC * ONTIME i.e. the fraction of time the telescope was actually able to take data.
  • DATE-OBS type: string
    • Observation start date (see Time)
  • TIME-OBS type: string
    • Observation start time (see Time)
  • DATE-END type: string
    • Observation end date (see Time)
  • TIME-END type: string
    • Observation end time (see Time)
  • N_TELS type: int
    • Number of participating telescopes
  • TELLIST type: string
    • Telescope IDs (e.g. ‘1,2,3,4’)
  • QUALITY type: int
    • Observation data quality. The following levels of data quality are defined:
      • -1 = unknown quality
      • 0 = best quality, suitable for spectral analysis.
      • 1 = medium quality, suitable for detection, but not spectra (typically if the atmosphere was hazy).
      • 2 = bad quality, usually not to be used for analysis.
  • OBJECT type: string
    • Primary target of the observation
    • Recommendations:
      • Use a name that can be resolved by SESAME
      • For survey observations, use “survey”.
      • For Off Observation, use “off observation”
  • GLON_PNT type: float, unit: deg
    • Observation pointing Galactic longitude (see Galactic).
  • GLAT_PNT type: float, unit: deg
    • Observation pointing Galactic latitude (see Galactic).
  • RA_OBJ type: float, unit: deg
    • Right ascension of OBJECT
  • DEC_OBJ type: float, unit: deg
    • Declination of OBJECT
  • TMID type: float, unit: s
    • Mid time of observation relative to the reference time
  • TMID_STR type: string
    • Mid time of observation in UTC string format: “YYYY-MM-DD HH:MM:SS”
  • EVENT_COUNT type: int
    • Number of events in run
  • EVENT_RA_MEDIAN type: float, unit: deg
    • Median right ascension of events
  • EVENT_DEC_MEDIAN type: float, unit: deg
    • Median declination of events
  • EVENT_ENERGY_MEDIAN type: float, unit: deg
    • Median energy of events
  • EVENT_TIME_MIN type: double, unit: s
    • First event time
  • EVENT_TIME_MAX type: double, unit: s
    • Last event time
  • BKG_SCALE type: float
    • Observation-dependent background scaling factor. See notes below.
  • TRGRATE type: float, unit: Hz
    • Mean system trigger rate
  • ZTRGRATE type: float, unit: Hz
    • Zenith equivalent mean system trigger rate
    • Some HESS chains export this at the moment and this quantity can be useful for data selection. Comparing values from different chains or other telescopes would require a more specific specification.
  • MUONEFF type: float
    • Mean muon efficiency
    • Currently use definitions from analysis chain, since creating a unified specification is tricky.
  • BROKPIX type: float
    • Fraction of broken pixels (0.15 means 15% broken pixels)
  • AIRTEMP type: float, unit: deg C
    • Mean air temperature at ground during the observation.
  • PRESSURE type: float, unit: hPa
    • Mean air pressure at ground during the observation.
  • NSBLEVEL type: float, unit: a.u.
    • Measure for NSB level
    • Some HESS chains export this at the moment and this quantity can be useful for data selection. Comparing values from different chains or other telescopes would require a more specific specification.
  • RELHUM type: float
Mandatory Header keywords

The standard FITS reference time header keywords should be used (see Formats). An observatory Earth location should be given as well (see Earth location).

As explained in HDU classes, the following header keyword should be used to declare the type of HDU:

Notes
  • Observation runs where the telescopes don’t point to a fixed RA / DEC position (e.g. drift scan runs) aren’t supported at the moment by this format.
  • Purpose / definition of BKG_SCALE: For a 3D likelihood analysis a good estimate of the background is important. The run-by-run varation of the background rate is ~20%. The main reasons are the changing atmospheric conditions. This parameter allows to specify (from separate studies) a scaling factor to the Background This factor comes e.g. from the analysis of off runs. The background normalisation usually dependends on e.g. the number of events in a run, the zenith angle and other parameters. This parameter provides the possibility to give the user a better prediction of the background normalisation. For CTA this might be induced from atmospheric monitoring and additional diagnostic input. For HESS we try to find a trend in the off run background normalisations and other parameters such as number of events per unit livetime. The background scale should be around 1.0 if the background model is good. This number should also be set to 1.0 if no dependency analysis has been performed. If the background model normalisation is off by a few orders of magnitude for some reasons, this can also be incorporated here.

HDU index table

The HDU index table is stored in a FITS file as a BINTABLE HDU:

  • Suggested filename: hdu-index.fits.gz
  • Suggested HDU name: HDU_INDEX

The HDU index table can be used to locate HDUs. E.g. for a given OBS_ID and (HDU_TYPE and / or HDU_CLASS), the HDU can be located via the information in the FILE_DIR, FILE_NAME and HDU_NAME columns. The path listed in FILE_DIR has to be relative to the location of the index file.

BASE_DIR

By default, file locations should be relative to the location of this HDU index file, i.e. the total file path is PATH_INDEX_TABLE / FILE_DIR / FILE_NAME. Tools are expected to support relative file paths in POSIX notation like FILE_DIR = "../../data/" as well as absolute file path like FILE_DIR = "/data/cta".

To allow for some additional flexibility, an optional header keyword BASE_DIR can be used. If it is given, the file path is BASE_DIR / FILE_DIR / FILE_NAME, i.e. the location of the HDU index table becomes irrelevant.

Columns
Column Name Description Data type Required?
OBS_ID Observation ID (a.k.a. run number) int yes
HDU_TYPE HDU type (see below) string yes
HDU_CLASS HDU class (see below) string yes
FILE_DIR Directory of file (rel. to this file) string yes
FILE_NAME Name of file string yes
HDU_NAME Name of HDU in file string yes
SIZE File size (bytes) int no
MTIME Modification time double no
MD5 Checksum string no
Mandatory Header keywords

As explained in HDU classes, the following header keyword should be used to declare the type of HDU (this HDU itself, the HDU index table):

HDU_TYPE and HDU_CLASS

The HDU_TYPE and HDU_CLASS can be used to select the HDU of interest.

The difference is that HDU_TYPE corresponds generally to e.g. PSF, whereas HDU_CLASS corresponds to a specific PSF format. Declaring HDU_CLASS here means that tools loading these files don’t have to do guesswork to infer the format on load.

Valid HDU_TYPE values:

  • events - Event list
  • gti - Good time interval
  • aeff - Effective area
  • psf - Point spread function
  • edisp - Energy dispersion
  • bkg - Background
  • rad_max - Directional cut for point-like IRFs

Valid HDU_CLASS values:

Relation to HDUCLAS

At HDU classes and throughout this spec, HDUCLAS header keys are defined as a declarative HDU classification scheme. This appears similar to this HDU index table, but in reality is different and incompatible!

Here in the index table, we have HDU_CLASS and HDU_TYPE. In HDU classes, there is HDUCLASS which is always “GADF” and then there is a hierarchical HDUCLAS1, HDUCLAS2, HDUCLAS3 and HDUCLAS4 that corresponds to the information in HDU_CLASS and HDU_TYPE here. Also the values are different: here we have lower-case and use e.g. HDU_CLASS="aeff", in HDU classes we use upper-case and e.g. HDUCLAS2="EFF_AREA"

One reason for these inconsistencies is that the spec for this HDU index table was written first (in 2015), and then HDU classes was introduced later (in 2017). Another reason is that here, we tried to be simple (flat, not hierarchical classification), and for HDU classes we tried to follow an existing standard.

Given that these index tables have been in use in the past years, and that we expect them to be replaced soon by a completely different way to locate and link IACT data HDUs, we decided to keep this format as-is, instead of trying to align it with HDU classes and create some form of hierarchical index table.

The observation index provides information of meta data about each observation run. E.g. pointing in the sky, duration, number of events, etc. The HDU index table provides a list of all available HDUs and in what files they are located. Science tools can make use of this index files to build filenames of required files according to some user parameters.

Note that the HDU index table would be superfluous if IRFs were always bundled in the same file with EVENTS and if the observation index table contained the location of that file. For HESS this wasn’t done initially, because the background IRFs were large in size and re-used for many runs. The level of indirection that the HDU index table offers allows to support both IRFs bundled with EVENTS (“per-run IRFs” as used in HESS) as well as the use of a global lookup database of IRFs located separately from EVENTS (sometimes called a CALDB), as used for the CTA first data challenge.

Sky Maps

This page describes data format conventions for FITS binned data and model representations pixelized with the HEALPix algorithm.

HEALPix Formats

This section describes a proposal for HEALPix format conventions which is based on formats currently used within the Fermi Science Tools (STs) and pointlike. This format is intended for representing maps and cubes of both integral and differential quantities including:

  • Photon count maps and cubes (e.g. as generated with with gtbin).
  • Exposure cubes (e.g. as generated with gtexpcube2).
  • Source maps – product of exposure with instrument response in spatial dimension (e.g. as generated with gtsrcmaps).
  • Model maps and cubes (the Fermi IEM and other diffuse emission components).

The format defines a SKYMAP HDU for storing a sequence of image slices (bands) and a BANDS HDU to store the geometry and coordinate mapping for each band. A band can represent any selection on non-spatial coordinates such as energy, time, or FoV angle. The most common use-case is a sequence of bands representing energy bins (for counts maps) or energy nodes (for source or model maps).

There are three primary HEALPix map formats which use different table structures for mapping table entries to HEALPix pixel and band:

  • IMPLICIT Format: The row to pixel mapping is determined implicitly from the row number. The row number corresponds to the HEALPix pixel number.
  • EXPLICIT Format: The row to pixel mapping is determined explicitly from the PIX column. This can be used to define maps or cubes that only encompass a partial region of the sky.
  • LOCAL Format: The row to pixel mapping is determined explicitly from the PIX column but with a local indexing scheme. This can be used to define maps or cubes that only encompass a partial region of the sky.
  • SPARSE Format: The row to pixel mapping is determined explicitly from the PIX column but with a variable number of pixels in each band. This format can be used to represent maps that have a different spatial geometry in each band and also supports band-dependent pixel size.

Note that there are variations of these primary formats which use different conventions for column, HDU, and header keywords names. The HPX_CONV keyword defines a specific mapping between the names used here and in these other formats:

  • FGST_CCUBE
  • FGST_LTCUBE
  • FGST_BEXPCUBE
  • FGST_SRCMAP
  • FGST_TEMPLATE
  • FGST_SRCMAP_SPARSE
  • GALPROP
  • GALPROP2
Non-Standard HEALPix Conventions
Sample Files
  • All-sky Counts Cube (IMPLICIT Format): FITS
  • Partial-sky Counts Cube (EXPLICIT Format): FITS
  • Partial-sky Counts Map (EXPLICIT Format): FITS
  • Partial-sky Counts Cube (SPARSE Format w/ fixed NSIDE):FITS
  • Partial-sky Counts Cube (SPARSE Format w/ variable NSIDE):FITS
SKYMAP HDU

The SKYMAP table contains the map data and row-to-pixel mapping formatted according to one of three indexing schemes specified by the INDXSCHM header keyword: IMPLICIT Format, EXPLICIT Format, or SPARSE Format. By convention if a file contains a single map it is recommended to name the extension SKYMAP. For maps with non-spatial dimensions an accompanying BANDS table must also be defined (see BANDS HDU).

Header Keywords

This section lists the keywords that should be written to the SKYMAP BINTABLE header. These keywords define the pixel size and ordering scheme that was used to construct the HEALPix map.

  • PIXTYPE, type: string
    • Should be set to HEALPIX.
  • INDXSCHM, type: string
    • Indexing scheme. Can be one of IMPLICIT, EXPLICIT, SPARSE. If this keyword is not provided then the IMPLICIT indexing scheme will be assumed.
  • ORDERING, type: string
    • HEALPix ordering scheme. Can be NESTED or RING.
  • COORDSYS, type: string
    • Map coordinate system. Can be CEL (celestial coordinates) or GAL (galactic coordinates).
  • ORDER, type: int
    • Healpix order. ORDER = log2(NSIDE) if NSIDE is a power of 2 and -1 otherwise. If the BANDS table is defined this keyword is superseded by the NSIDE column.
  • NSIDE, type: int
    • Number of healpix pixels per side. If the BANDS table is defined this keyword is superseded by the NSIDE column.
  • FIRSTPIX, type: int
    • Index of first pixel in the map.
  • LASTPIX, type: int
    • Index of last pixel in the map.
  • HPX_REG, type: string, optional
    • Region string for the geometric selection that was used to construct a partial-sky map. See HEALPix Region String for additional details.
  • HPX_CONV, type: string, optional
  • BANDSHDU, type: string, optional
    • Name of HDU containing the BANDS table. If undefined the extension name should be EBOUNDS or ENERGIES. See BANDS HDU for additional details.
HEALPix Region String

For partial-sky maps that correspond to a particular geometric selection one can optionally specify the selection that was used in constructing the map with the HPX_REG header keyword. The following region strings are supported:

  • DISK({LON},{LAT},{RADIUS}): A circular selection centered on the coordinates ({LON}, {LAT}) with radius {RADIUS} with all quantities given in degrees. A pixel is included in the selection if its center is within {RADIUS} deg of coordinates ({LON}, {LAT}).
  • DISK_INC({LON},{LAT},{RADIUS}): A circular selection centered on the coordinates ({LON}, {LAT}) with radius {RADIUS} with all quantities given in degrees. A pixel is included in the selection if any part of it is encompassed by the circle.
  • HPX_PIXEL({ORDERING},{ORDER},{PIX}): A selection encompassing all pixels contained in the HEALPix pixel of the given pixelization where ordering is {ORDERING} (i.e. NESTED or RING), order is {ORDER}, and pixel index is {PIX}.
IMPLICIT Format

The IMPLICIT format defines a one-to-one mapping between table row and HEALPix pixel index. Each energy plane is represented by a separate column (CHANNEL0, CHANNEL1, etc.). This format can only be used for all-sky maps.

SKYMAP Columns
  • CHANNEL{BAND_IDX} – ndim: 1, type: float or int
    • Dimension: nrows
    • Amplitude in image plane {BAND_IDX}. The HEALPix pixel index is determined from the table row.
EXPLICIT Format

The EXPLICIT format uses an additional PIX column to explicitly define the pixel number corresponding to each table row. Pixel values for each band are represented by a separate column (CHANNEL0, CHANNEL1, etc.). This format can be used for both all-sky and partial-sky maps but requires the same pixel size for all bands.

HEADER
  • INDXSCHM : EXPLICIT
SKYMAP Columns
  • PIX – ndim: 1, unit: None, type: int
    • Dimension: nrows
    • HEALPix pixel index. This index is common to all bands.
  • CHANNEL{BAND_IDX} – ndim: 1, type: float or int
    • Dimension: nrows
    • Amplitude in HEALPix pixel PIX and band {BAND_IDX}.
LOCAL Format

The LOCAL format is identical to the EXPLICIT Format with the exception that the PIX column contains a local rather than global HEALPix index. The local HEALPix index is a mapping of global indices in a partial-sky geometry to 0, …, N-1 where N is the total number of pixels in the geometry. For an all-sky geometry the local index is equal to the global index. This format can be used for both all-sky and partial-sky maps as well as maps with a different pixel-size in each band.

HEADER
  • INDXSCHM : LOCAL
SKYMAP Columns
  • PIX – ndim: 1, unit: None, type: int
    • Dimension: nrows
    • Local HEALPix pixel index. The mapping to global HEALPix index is derived by finding the index of the ith pixel in the geometry where pixels are ordered by their global index values.
  • CHANNEL{BAND_IDX} – ndim: 1, type: float or int
    • Dimension: nrows
    • Amplitude in HEALPix pixel PIX and band {BAND_IDX}.
SPARSE Format

The SPARSE format allows for an arbitrary set of pixels to be defined in each band. The SKYMAP table contains three columns with the pixel index, band index, and amplitude. Pixel values for each band are continguous and arranged in order of band index. This format supports an different HEALPix pixel size in each band defined by the NSIDE column in the BANDS table.

Pixels that are undefined but contained within the geometric selection are assumed to be zero while pixels outside the geometric selection are undefined. For counts data this allows for maps that only contain pixels with at least one count.

HEADER
  • INDXSCHM : SPARSE
SKYMAP Columns
  • PIX – ndim: 1, unit: None, type: int
    • Dimension: nrows
    • HEALPix pixel index. Pixels are ordered by band number. The row to band mapping is defined by the CHANNEL column. The column type may be either 32- or 64-bit depending on the maximum index of the map geometry. A 32-bit column type is sufficient for maps with NSIDE as large as 8192.
  • CHANNEL – ndim: 1, unit: None, type: int
    • Dimension: nrows
    • Band index. This column is optional for maps with a single band. For efficiency it is recommended to represent this column with a 16-bit integer.
  • VALUE – ndim: 1, unit: None, type: float or int
    • Dimension: nrows
    • Amplitude in pixel indexed by PIX and CHANNEL.

WCS Formats

This page describes data format conventions for images and cubes pixelized with World Coordinate Systems (WCS). WCS is a system for describing transformations between pixel/world coordinates and sky coordinates. The conventions described here are extensions to the FITS WCS standard which allow for serializing more complex data structures such as sparse maps or maps with non-regular geometry (e.g. a different pixel size in each image plane).

The format splits the specification of an image into two HDUs: an Image HDU and a BANDS HDU. The image HDU contains the image data while the BANDS HDU is an optional extension that stores information about non-spatial dimensions.

There are two supported WCS formats:

  • IMAGE Format: This is the standard FITS image format whereby data is stored in a FITS ImageHDU. With the exception of handling of non-spatial dimensions images generated with this format follow standard FITS format conventions.
  • SPARSE Format: This is a sparse data format for FITS images which uses a BinTableHDU to store pixel values.

By default multi-dimensional maps are assumed to have the same projection in each image plane with the WCS projection specified in the header keywords.

Both formats support the specification of non-regular multi-dimensional geometries through the inclusion of CRPIX, CEDLT, and CRVAL columns in the BANDS HDU. A non-regular geometry is one in which each image plane may have a different pixelization (e.g. different pixel size or image extent). The CRPIX, CEDLT, and CRVAL override the pixelization of the base WCS projection (defined in the IMAGE HDU header) in each image plane.

Sample Files
  • Counts Cube: FITS
  • Counts Cube (Multi-resolution/non-regular): FITS
  • Counts Cube (SPARSE Format): FITS
Image HDU

The IMAGE HDU contains the map data and can be formatted according to either the IMAGE Format or SPARSE Format scheme.

WCS FITS Header Keywords

The keywords listed here are those required by the FITS WCS specification.

  • CRPIX{IDX}, type: float
    • Pixel coordinate of reference point for axis {IDX}.
  • CDELT{IDX}, type: float
    • Coordinate increment at reference point for axis {IDX}.
  • CTYPE{IDX}, type: float
    • Coordinate system and projection code for axis {IDX}.
  • CRVAL{IDX}, type: float
    • Coordinate value at reference point for axis {IDX}.
Extra Header Keywords

These are extra keywords that are not part of the FITS WCS specification but are required for some features of the GADF format.

  • WCSSHAPE, type: string, optional
    • Comma-separated list with the number of pixels in each dimension in row-major order, e.g. (4,5,3) would correspond to a 4x5 image with three image planes. For non-regular geometries the image dimension should be the maximum across all image planes. This keyword is optional for maps in the IMAGE Format format.
  • BANDSHDU, type: string, optional
    • Name of HDU containing the BANDS table. If undefined the extension name should be EBOUNDS or ENERGIES. See BANDS HDU for additional details.
IMAGE Format

The IMAGE format uses an ImageHDU to store map values. Dimensions of the image are directly inferred from the data member of the HDU. This is the standard format for WCS-based maps.

SPARSE Format

The SPARSE WCS format uses the same conventions as the Sparse HEALPix Format. This format uses a BinTableHDU to store map values with one row per pixel.

Columns
  • PIX – ndim: 1, unit: None, type: int
    • Dimension: nrows
    • Flattened pixel index in a given image plane (band). Indices are flattened assuming a column-major ordering for the image. The row to band mapping is defined by the CHANNEL column.
  • CHANNEL – ndim: 1, unit: None, type: int
    • Dimension: nrows
    • Band index. This column is optional for maps with a single band. For efficiency it is recommended to represent this column with a 16-bit integer.
  • VALUE – ndim: 1, unit: None, type: float or int
    • Dimension: nrows
    • Amplitude in pixel indexed by PIX and CHANNEL.

This page describes data formats for data structures representing images in celestial coordinates. A sky map may have one or more non-spatial dimensions (e.g. energy or time) such that the data consists of a sequence of image planes. There are two primary pixelization formats which are described in more detail in the following sub-pages:

Both pixelization formats store the data in a single IMAGE HDU which can be either an ImageHDU or BinTableHDU depending on the image format. A BANDS HDU is required for maps with non-spatial dimensions.

BANDS HDU

For maps with non-spatial dimensions, the BANDS table defines the geometry in each band and the band to coordinate mapping for non-spatial dimensions (e.g. energy). The BANDS table is optional for maps with a single band.

The extension name of the BANDS table associated to an image HDU is given by the BANDSHDU header keyword of the image HDU. If BANDSHDU is undefined the BANDS table should be read from the EBOUNDS or ENERGIES HDU. The BANDS table extension names EBOUNDS and ENERGIES are reserved for maps with a third energy dimension and are supported for backward compatibility with existing file format conventions of the Fermi STs. Although each map will have its own IMAGE HDU, a single BANDS table can be associated to multiple maps (if they share the same geometry).

The BANDS table contains 1 row per band with columns that define the mapping of the band to the non-spatial dimensions of the map. For integral quantities (e.g. counts) this should be the lower and upper edge values of the bin. For differential quantities this should be the coordinates at which the map value was evaluated. Some examples of quantities that can be used to define bands are as follows:

  • Energy (Integral): E_MIN, E_MAX
  • Energy (Differential): ENERGY
  • Event Type: EVENT_TYPE
  • Time: TIME_MIN, TIME_MAX
  • FoV Angle: THETA_MIN, THETA_MAX

The mapping of column to non-spatial dimension is defined with the AXCOLS{IDX} header keyword. For maps with multiple non-spatial dimensions the mapping of channel number to band indices follows a column-major ordering. For instance for a map with a first energy dimension with 3 bins indexed by i and a second time dimension with 2 bins indexed by j the band index mapping to channel number k would be as follows:

(i, j) = (0,0) : (k) = (0,)
(i, j) = (1,0) : (k) = (1,)
(i, j) = (2,0) : (k) = (2,)
(i, j) = (0,1) : (k) = (3,)
(i, j) = (1,1) : (k) = (4,)
(i, j) = (2,1) : (k) = (5,)
Header Keywords
  • AXCOLS{IDX}, type: string, optional
    • Comma-separated list of columns in the BANDS table corresponding to non-spatial dimension with one-based index {IDX}. If there are two elements the columns should be interpreted as the lower and upper edges of each bin. If there is a single element the column should be interpreted as the bin center. For EBOUNDS or ENERGIES HDUs this keyword is optional.

For time based axes the additional keywords described in Formats are required.

Columns
  • CHANNEL, ndim: 1
    • Dimension: nbands
    • Unique index of the band. If this column is not defined then the band index should be inferred from the row number indexing from zero. For maps with multiple non-spatial dimensions the index mapping to channel number follows a column-major ordering.
  • E_MIN, ndim: 1, unit: keV, optional
    • Dimension: nbands
    • Lower energy bound for integral quantities.
  • E_MAX, ndim: 1, unit: keV, optional
    • Dimension: nbands
    • Upper energy bound for integral quantities.
  • ENERGY, ndim: 1, unit: keV, optional
    • Dimension: nbands
    • Energy value for differential quantities.
  • EVENT_TYPE, ndim: 1, optional
    • Dimension: nbands
    • Integer key for a sequence of independent data subselections (e.g. FRONT/BACK-converting LAT events).
WCS Columns

This section lists BANDS table columns specific to the WCS map format. These are optional columns to specify the pixelization for non-regular geometries.

  • NPIX, ndim: 2, type: int, optional
    • Number of pixels in longitude and latitude in each image plane.
  • CRPIX, ndim: 2, type: float, optional
    • Reference pixel coordinate [deg] in longitude and latitude in each image plane.
  • CDELT, ndim: 2, type: float, optional
    • Pixel size [deg] in longitude and latitude in each image plane.
HPX Columns

This section lists BANDS table columns specific to the HEALPix map format.

  • NSIDE – ndim: 1,
    • Dimension: nbands
    • NSIDE of the HEALPix pixelization in this band. If not defined then NSIDE should be inferred from the FITS header. Only required for formats that support energy-dependent pixelization (SPARSE).

Spectra

Here we describe data formats for high-level spectral analysis results.

Science tools are encouraged to use these formats for easy interoperability with other codes (e.g. to check results, combine results in one plot, …).

SED

The SED format is a flexible specification for representing one-dimensional spectra (distributions of amplitude vs. energy). The SED is structured as a table with one row per energy bin/point and columns for the energy, measured normalization, and normalization errors. The format supports both integral and differential representations of the normalization as described in Normalization Representation.

The list of supported columns is given in the Columns section. All columns are optional by default, and an SED may contain any combination of the allowed columns. SED Types are a specification for defining groups of columns that are required to be present in the file. The Likelihood SED format is an example of an SED type that contains both measured flux points and the profile likelihoods versus normalization in each energy bin.

The SED format does not enforce a specific set of units but units should be defined in the column metadata for all quantities with physical dimensions. Recommended units are provided in the Columns section to indicate the dimensionality of each column. Column UCDs are intended for defining the type of physical quantity associated to each column (e.g. energy, photon flux, etc.). The convention for including UCDs in the column metdata is still under discussion and the UCDs defined in the current format are optional.

The intended serialization format is a FITS BINTABLE with one row per energy bin. However any serialization format that supports tabular data and column metadata could also be supported (e.g. ECSV or HDF5). Because the SED occupies a single HDU multiple SEDs can be written to a single FITS file with an identifier (e.g. source name or observation epoch) used as the HDU name. Sample FITS and ECSV files are provided in Sample Files.

Normalization Representation

The SED format supports both differential and integral representations of the source normalization. Integral representations correspond to quantities integrated over an energy bin as defined by the e_min and e_max columns. Differential representations are quantities evaluated at a discrete energies defined by the e_ref column. The supported Normalization Columns are:

  • dnde : Differential photon flux at e_ref. Dimensionality: photons / (time * area * energy)
  • e2dnde : Defined as (e_ref ^ 2) * dnde. A commonly published and plotted differential flux quantity. Dimensionality: energy / (time * area)
  • rate : Photon rate between e_min and e_max. Dimensionality: photons / time.
  • flux : Photon flux (integral of dnde) between e_min and e_max. Dimensionality: photons / ( time * area )
  • eflux : Energy flux (integral of E times dnde) between e_min and e_max. Dimensionality: energy / ( time * area )
  • npred : Photon counts between e_min and e_max. Dimensionality: photons
  • norm : Normalization in units of the reference model. Dimensionality: unitless

An SED should contain at least one of the normalization representations listed above. Multiple representations (e.g. flux and dnde) may be included in a single SED.

The dnde and e2dnde representations are equivalent. We define both here, because both are in common use for publications and plots.

Errors and upper limits on the normalization are defined with the Error Columns by appending the appropriate suffix to the normalization column name. For example in the case of photon flux the error and upper limit columns are:

  • flux_err : Symmetric 1-sigma error.
  • flux_errp : Asymmtric 1-sigma positive error.
  • flux_errn : Asymmtric 1-sigma negative error.
  • flux_ul : Upper limit with confidence level given by UL_CONF header keyword.

A row may have a value and any combination of upper limits and errors:

e_ref dnde dnde_err dnde_errp dnde_errn dnde_ul
1000.0 1.0 0.1 0.1 0.1 1.16
3000.0 1.0 0.1 0.1 0.1 1.16

A nan value should be used for empty or missing data such as a bin for which there is an upper limit but no value and vice versa, e.g.

e_ref dnde dnde_err dnde_ul
1000.0 1.0 0.1 nan
3000.0 nan nan 1.16

The is_ul column is an optional boolean flag that can be used to designate whether the measurement in a given row should be interpreted as a two-sided confidence interval or an upper limit:

e_ref dnde dnde_err dnde_ul is_ul
1000.0 1.0 0.1 nan False
3000.0 nan nan 1.16 True

Setting UL values to nan is an implicit way of flagging rows that do not contain an upper limit. When parsing an SED one should first check for the existence of the is_ul column and otherwise check for nan values in the UL column. The is_ul column is only required if you want to explicitly flag ULs when both the UL and two-sided interval may be defined in a row.

Reference Model

The reference model of an SED is the global parameterization that was used to extract the normalization in each energy bin. The choice of reference model is relevant when considering models that are rapidly changing across a bin or when energy dispersion is large relative to the bin size. The Reference Model Columns define the reference model in different representations. When an SED includes a reference model, the normalizations, errors, and upper limits can be given in the norm representation which is expressed in units of the reference model. norm columns can be converted to another representation by performing an element-wise multiplication of the column with the ref column of the desired representation.

Likelihood Profiles

The Likelihood Columns contain values of the model likelihood and likelihood profiles versus normalization. Likelihood profiles provide additional information about the measurement uncertainty in each bin. A more detailed discussion of the motivation for SED likelihood profiles can be found in Likelihood SED.

SED Types

By default all columns in the SED format are optional. To facilitate interoperability of files produced by different packages/tools, the SED format defines an SED Type string which is set with the SED_TYPE header keyword. The SED type defines a minimal set of columns that must be present in the SED. The SED types and their required columns are given in the following list:

  • dnde: e_ref, dnde
  • e2dnde: e_ref, e2dnde
  • flux: e_min, e_max, flux
  • eflux: e_min, e_max, eflux
  • likelihood: See Likelihood SED.
Sample Files
Header Keywords
  • UL_CONF, optional
    • Confidence level of the upper limit (range: 0 to 1) of the value in the {NORM_REP}_ul column.
  • SED_TYPE, optional
    • SED type string (see SED Types for more details).
Columns

This sections lists the column specifications. Unless otherwise specified the data type of all columns should be float32 or float64.

Energy Columns
  • e_min – ndim: 1, unit: MeV
    • Dimension: nebins
    • ucd : em.energy
    • Lower edge of energy bin. This defines the lower integration bound for integral representations of the normalization. Can also define the energy band used to evaluate differential representations (dnde or e2dnde).
  • e_max – ndim: 1, unit: MeV
    • Dimension: nebins
    • ucd : em.energy
    • Upper edge of energy bin. This defines the upper integration bound for integral representations of the normalization. Can also define the energy band used to evaluate differential representations (dnde or e2dnde).
  • e_ref – ndim: 1, unit: MeV
    • Dimension: nebins
    • ucd : em.energy
    • Energy at which differential representations of the normalization are evaluated (e.g. dnde). This can be the geometric center of the bin or some weighted average of the energy distribution within the bin.
Normalization Columns
  • norm – ndim: 1, unit: None
    • Dimension: nebins
    • Measured normalization in units of the reference model.
  • dnde – ndim: 1, unit: 1 / (cm2 s MeV)
    • Dimension: nebins
    • ucd : phot.flux.density
    • Measured differential photon flux at e_ref.
  • e2dnde – ndim: 1, unit: MeV / (cm2 s)
    • Dimension: nebins
    • ucd : phot.flux.density
    • Measured differential photon flux at e_ref, multiplied with e_ref ^ 2.
  • flux – ndim: 1, unit: 1 / (cm2 s)
    • Dimension: nebins
    • ucd : phot.count
    • Measured photon flux between e_min and e_max.
  • eflux – ndim: 1, unit: MeV / (cm2 s)
    • Dimension: nebins
    • ucd : phot.flux
    • Measured energy flux between e_min and e_max.
  • npred – ndim: 1
    • Dimension: nebins
    • Measured counts between e_min and e_max.
Error Columns

The error columns define the error and upper limit for a given representation of the normalization. In the following column specifications {NORM_REP} indicates a placeholder for the name of the normalization column (e.g. flux_err).

  • {NORM_REP}_err – ndim: 1
    • Dimension: nebins
    • Symmetric error on the normalization in the representation {NORM_REP}.
  • {NORM_REP}_errp – ndim: 1
    • Dimension: nebins
    • Positive error on the normalization in the representation {NORM_REP}.
  • {NORM_REP}_errn – ndim: 1
    • Dimension: nebins
    • Negative error on the normalization in the representation {NORM_REP}. A negative or NaN value indicates that the negative error is undefined.
  • {NORM_REP}_ul – ndim: 1
    • Dimension: nebins
    • Upper limit on the normalization in the representation {NORM_REP}. The upper limit confidence level is specified with the UL_CONF header keyword.
  • is_ul – ndim: 1, type: bool
    • Dimension: nebins
    • Boolean flag indicating whether a row should be interpreted as an upper limit. If True then one should represent the measurement with the one-sided confidence interval defined by {NORM_REP}_ul. If False then the measurement should be represented by the two-sided intervals defined by {NORM_REP}_err or {NORM_REP}_errp and {NORM_REP}_errn.
Reference Model Columns
  • ref_dnde – ndim: 1, unit: 1 / (MeV cm2 s)
    • Dimension: nebins
    • Differential flux of reference model at the e_ref.
  • ref_eflux – ndim: 1, unit: MeV / (cm2 s)
    • Dimension: nebins
    • Energy flux (integral of E times dnde) of reference model from e_min to e_max.
  • ref_flux – ndim: 1, unit: 1 / (cm2 s)
    • Dimension: nebins
    • Flux (integral of dnde) of reference model from e_min to e_max.
  • ref_dnde_e_min – ndim: 1, unit: 1 / (MeV cm2 s)
    • Dimension: nebins
    • Differential flux of reference model at e_min.
  • ref_dnde_e_max – ndim: 1, unit: 1 / (MeV cm2 s)
    • Dimension: nebins
    • Differential flux of reference model at e_max.
  • ref_npred – ndim: 1, unit: counts
    • Dimension: nebins
    • Number of photon counts of reference model.
Likelihood Columns
  • ts – ndim: 1, unit: none
    • Dimension: nebins
    • Source test statistic in each energy bin defined as twice the difference between best-fit and null log-likelihood values. In the asymptotic limit this is square of the significance.
  • loglike – ndim: 1, unit: none
    • Dimension: nebins
    • Log-Likelihood value of the best-fit model.
  • loglike_null – ndim: 1, unit: none
    • Dimension: nebins
    • Log-Likelihood value of the zero normalization model.
  • {NORM_REP}_scan – ndim: 2, unit: None
    • Dimension: nebins x nnorms
    • Likelihood scan points in each energy bin in the representation {NORM_REP}.
  • dloglike_scan – ndim: 2, unit: none
    • Dimension: nebins x nnorms
    • Scan over delta LogLikelihood value vs. normalization in each energy bin. The Delta-Loglikelihood is evaluated with respect to the zero normalization model (loglike_null).

Bin-by-bin Likelihood Profiles

This page describes formats for bin-by-bin likelihood profiles as currently used in some LAT analyses. The bin-by-bin likelihood extends the concept of an SED by providing a representation of the likelihood function in each energy bin. Likelihood SED and Likelihood SED Cube are two formats for serializing bin-by-bin likelihoods to a FITS file. A Likelihood SED stores the bin-by-bin likelihood for a single source or test source position while a Likelihood SED Cube stores a sequence of bin-by-bin likelihoods (e.g. for a grid of positions or a group of sources).

In the following we describe some advantages and limitations of using bin-by-bin likelihoods. Relative to a traditonal SED, the bin-by-bin likelihood retains more information about the shape of the likelihood function around the maximum. This can be important when working in the low statistics regime where the likelihoods are non-Gaussian and a flux value and one sigma uncertainty is insufficient to describe the shape of the likelihood function. Applications in which bin-by-bin likelihoods may be useful include:

  • Deriving upper limits on the global spectral distribution of a source. Likelihood SEDs can be used to construct the likelihood function for arbitrary spectral models without recomputing the experimental likelihood function. This is particularly useful for DM searches in which one tests a large number of spectral models (e.g. for mass and annihilation channel) and recomputing the experimental likelihood function for all models would be very expensive. The bin-by-bin likelihoods are also a convenient way of distributing analysis results in a format that allows other spectral models to be easily tested. The two of the most recent LAT publications on dSph DM searches have publicly released the analysis results in this format (see 2015PhRvL.115w1301A and 2014PhRvD..89d2001A).
  • Stacking analyses that combine measurements from multiple sources or multiple epochs of observation of a single source. Forming a joint likelihood from the product of Likelihood SEDs fully preserves information in each data set and is equivalent to doing a joint fit as long as the data sets are independent.
  • Analyses combining spectral measurements from multiple experiments. Likelihoods from two or more experiments can be multiplied to derive a joint likelihood function incorporating the measurements of each experiment. As for stacking analyses, the joint likelihood approach avoids merging or averaging data or IRFs. The bin-by-bin likelihoods further allow joint anlayses to be performed without having access to the data sets or tools that produced the original measurement. For an application of this approach in the context of DM searches see 2016JCAP…02..039M.

There are a few important caveats to bin-by-bin likelihoods which may limit their use for certain applications:

  • Large correlations between the normalizations of two or more model components (e.g. when the spatial models are partially degenerate) can limit the utility of this approach. Although such correlations can be accounted for by profiling the corresponding nuisance parameters, this may result in unphysical background models with large bin-to-bin fluctuations in the model amplitude. One technique to avoid this issue (see 2015PhRvD..91j2001B and 2016PhRvD..93f2004C) is to apply a Gaussian prior that constrains the spectral distribution of the background components to lie within a certain range of the global spectral model of that source (computed without the test source).
  • Because the likelihoods in each energy bin are calculated independently, this technique cannot fully account for bin-to-bin correlations caused by energy dispersion. The effect of energy dispersion can be corrected to first order by scanning the likelihood with a spectral model (e.g. a power-law with index 2) that is close in shape to the spectral models of interest. However in analyses where the energy response matrix is particularly broad or non-diagonal the systematic errors arising from the approximate treatment of energy dispersion may exceed the statistical errors. In LAT analyses energy dispersion can become a significant effect when using data below 100 MeV (see LAT_edisp_usage). However when using an Index=2.0 and considering energies above 100 MeV, the spectral bias is less than 3% for models with indices between 1 and 3.5.
Likelihood SED

The likelihood SED is a representation of spectral energy distribution of a source that contains a likelihood for the source normalization in each energy bin. This format is a special case of the more general SED format. Depending on the requirements of the analysis the likelihoods can be evaluated with either profiled or fixed nuisance parameters. The likelihood SED can be used in the same way as a traditional SED but contains additional information about the shape of the likelihood function around the maximum. A 2D visualization of the likelihood functions can be produced by creating a colormap with intensity mapped to the likelihood value:

Low Significance Source High Significance Source
image0 image1

In the following we use nebins to designate the number of energy bins and nnorms to designate the number of points in the normalization scan. The format is a BINTABLE with one row per energy bin containing the columns listed below.

The best-fit model amplitudes, errors, and upper limits are all normalized to a reference spectral model. The ref columns define the amplitude of the reference model in different units. The reference model amplitudes are arbitrary and could for instance be set to the best-fit amplitude in each energy bin. norm columns contain the best-fit value, its errors, and upper limit in units of the reference model amplitude. Unit conversion of the norm columns can be performed by doing a row-wise multiplication with the respective ref column.

Sample Files

See the likelihood files in Sample Files.

Header Keywords
  • SED_TYPE
    • SED type string. Should be set to likelihood.
  • UL_CONF, optional
    • Confidence level of the upper limit (range: 0 to 1) of the value in the norm_ul column.
Columns

The columns listed here are a subset of the columns defined in the SED format. See Columns for the full column specifications.

Required Columns
  • e_min – ndim: 1, Dimension: nebins
  • e_max – ndim: 1, Dimension: nebins
  • e_ref – ndim: 1, Dimension: nebins
  • ref_dnde – ndim: 1, Dimension: nebins
  • ref_eflux – ndim: 1, Dimension: nebins
  • ref_flux – ndim: 1, Dimension: nebins
  • ref_npred – ndim: 1, Dimension: nebins
  • norm – ndim: 1, Dimension: nebins
  • norm_err – ndim: 1, Dimension: nebins
  • norm_scan – ndim: 2, Dimension: nebins x nnorms
  • ts – ndim: 1, Dimension: nebins
  • loglike – ndim: 1, Dimension: nebins
  • dloglike_scan – ndim: 2, Dimension: nebins x nnorms
Optional Columns
  • ref_dnde_e_min – ndim: 1, Dimension: nebins
  • ref_dnde_e_max – ndim: 1, Dimension: nebins
  • norm_errp – ndim: 1, Dimension: nebins
  • norm_errn – ndim: 1, Dimension: nebins
  • norm_ul – ndim: 1, Dimension: nebins
Likelihood SED Cube

The Likelihood SED Cube is format for storing a sequence of Likelihood SEDs in a single table. The format defines a file with two BINTABLE HDUs: SCANDATA and EBOUNDS. SCANDATA has one row per Likelihood SED while EBOUNDS has one row per energy bin. Table rows in SCANDATA can be mapped to a list of sources, spatial pixels, or observations epochs. Because the row mapping is not defined by the format itself additional columns can be added to SCANDATA that defined the mapping of each row. Examples would be columns for source name designation, pixel coordinate, or observation epoch.

In the following we use nrows to designate table rows, nebins to designate the number of energy bins and nnorms to designate the number of points in the normalization scan. As for the Likelihood SED format, columns that contain norm are expressed in units of the reference model amplitude. These can be multiplied by ref_eflux, ref_flux, ref_dnde, or ref_npred columns in the EBOUNDS HDU to get the normalization in the respective units.

Sample FITS files:

SCANDATA Table

The SCANDATA HDU is a BINTABLE with the following columns. The columns listed here are a subset of the columns in the SED format. Relative to the 1D SED formats the dimensionality of all columns is increased by one with the first dimension (rows) mapping to spatial pixels. See Columns for the full column specifications.

Header Keywords
  • UL_CONF, optional
    • Confidence level of the upper limit given in the norm_ul column.
Required Columns
  • dloglike_scan – ndim: 3, Dimension: nrows x nebins x nnorms
  • norm_scan – ndim: 3, Dimension: nrows x nebins x nnorms
  • norm – ndim: 2, Dimension: nrows x nebins
  • norm_err – ndim: 2, Dimension: nrows x nebins
  • ts – ndim: 2, Dimension: nrows x nebins
  • loglike – ndim: 2, Dimension: nrows x nebins
Optional Columns
  • ref_npred – ndim: 2, Dimension: nrows x nebins
  • norm_errp – ndim: 2, Dimension: nrows x nebins
  • norm_errn – ndim: 2, Dimension: nrows x nebins
  • norm_ul – ndim: 2, Dimension: nrows x nebins
  • bin_status – ndim: 2, unit: None
    • Dimension: nrows x nebins
    • Fit status code. 0 = OK, >0 = Not OK
EBOUNDS Table

The EBOUNDS HDU is a BINTABLE with 1 row per energy bin and the following columns. The columns listed here are a subset of the columns in the SED format. See Columns for the full column specifications. Note that for backwards compatibility with existing EBOUNDS table convention (e.g. as used for WCS counts cubes) columns names are upper case.

Required Columns
  • E_MIN, unit: keV, Dimension: nebins
  • E_REF, unit: keV, Dimension: nebins
  • E_MAX, unit: keV, Dimension: nebins
  • REF_DNDE – ndim: 1, Dimension: nebins
  • REF_EFLUX – ndim: 1, Dimension: nebins
  • REF_FLUX – ndim: 1, Dimension: nebins
Optional Columns
  • REF_DNDE_E_MIN – ndim: 1, Dimension: nebins
  • REF_DNDE_E_MAX – ndim: 1, Dimension: nebins
  • REF_NPRED – ndim: 1, Dimension: nebins
TSCube Output Format

Recent releases of the Fermi ScienceTools provide a gttscube application that fits a test source on a grid of spatial positions within the ROI. At each test source position this tool calculates the following information:

  • TS and best-fit amplitude of the test source.
  • A likelihood SED.

The output of the tool is a FITS file containing a Likelihood SED Cube with nrows in which each table row maps to a pixel in the grid scan. The PRIMARY HDU contains the same output as gttsmap – a 2-dimensional FITS IMAGE with the test source TS evaluated at each position. The primary fit results are contained in the following BINTABLE HDUs:

  • A SCANDATA Table containing the likelihood SEDs for each spatial pixel.
  • A FITDATA Table containing fit results for the reference model at each spatial pixel over the full energy range.
  • A EBOUNDS Table containing the bin definitions and the amplitude of the reference model.

The mapping of rows to pixels is defined by the WCS header keywords in the SCANDATA HDU. Following the usual FITS convention both tables use columnwise ordering for mapping rows to pixel indices.

Here is the list of HDUs:

TS Cube HDUs
HDU HDU Type HDU Name Description
0 IMAGE PRIMARY TS map of the region using the test source
1 BINTABLE SCANDATA Table with the data from the likelihood v. normalization scans. Follows format specification given in SCANDATA Table.
2 BINTABLE FITDATA Table with the data from the reference model fits.
3 BINTABLE BASELINE Parameters and Covariences of Baseline fit.
4 BINTABLE EBOUNDS Energy bin edges, fluxes and NPREDs for test source in each energy bin. Follows format specification given in EBOUNDS Table.
FITDATA Table

The FITDATA HDU is a BINTABLE with 1 row per spatial pixel (nrows) and the following columns:

  • fit_norm – ndim: 1, unit: None
    • Dimension: nrows
    • Best-fit normalization for the global model in units of the reference model amplitude.
  • fit_norm_err – ndim: 1, unit: None
    • Dimension: nrows
    • Symmetric error on the global model normalization in units of the reference model amplitude.
  • fit_norm_errp – ndim: 1, unit: None
    • Dimension: nrows
    • Positive error on the global model normalization in units of the reference model amplitude.
  • fit_norm_errn – ndim: 1, unit: None
    • Dimension: nrows
    • Negative error on the global model normalization in units of the reference model amplitude.
  • fit_norm_ul – ndim: 1, unit: None
    • Dimension: nrows
    • Upper limit on the global model normalization in units of the reference model amplitude.
  • fit_ts – ndim: 1, unit: None
    • Dimension: nrows
    • Test statistic of the best-fit global model.
  • fit_status – ndim: 1, unit: None
    • Dimension: nrows
    • Status code for the fit. 0 = OK, >0 = Not OK

1D counts spectra

This section describes a data format for 1D counts spectra and associated reduced responses (ARF and RMF) that has been used for decades in X-ray astronomy, and is also used in VHE gamma-ray astronomy.

The EVENTS and 2D IRFs can be transformed into a 1D counts vector and 1D IRFs that can serve as input to general X-ray spectral analysis packages such as Sherpa. For an introduction to this so-called OGIP data format please refer to the official Documentation on HEASARC.

The following section only highlight differences and modifications made to the OGIP standard in order to meet the requirements of gamma-astronomy.

PHA

The OGIP standard PHA file format is described here.

TODO: Should an EBOUNDS extension be added to the PHA file (channels -> energy)? In OGIP this info has to be extraced from the RMF file.

The values of following header keywords need some attention when using them for IACT analysis.

  • BACKSCAL
    • For now it is assumed that exposure ration between signal and background counts does not depend on energy, thus this keyword must be set

    • The BACKSCAL keywords in the PHA and the BKG file must be set so that

      \[\alpha = \frac{\mathrm{PHA}_{\mathrm{backscal}}}{\mathrm{BKG}_{\mathrm{backscal}}}\]
    • It is recommended to set the BACKSCAL keyword to 1 in the PHA file and to \(1/\alpha\) in the BKG file

Additional header keyword that can be stored in the PHA header for IACT analysis are listed below.

  • OFFSET type: float, unit deg
    • Distance between observation position and target of a spectral analysis
  • MUONEFF type: float
    • Muon efficiency of the observation
  • ZENITH type: tbd, unit: deg
    • Zenith angle of the observation
  • ON-RAD type: float, unit deg
    • Radius of the spectral extraction region
    • Defines the spectral extraction region together with the standard keywords RA-OBJ and DEC-OBJ
BKG

The values of following header keywords need some attention when using them for IACT analysis.

  • BACKSCAL
    • It is recommended to set the BACKSCAL keyword to \(1/\alpha\) in the BKG file (see above)
ARF

The OGIP standard ARF file format is described here.

Additional header keyword that can be stored in the ARF header for IACT analysis are listed below.

  • LO_THRES type: tbd, unit: TeV
    • Low energy threshold of the analysis
  • HI_THRES type: tbd, unit: TeV
    • High energy threshold of the analysis
RMF

The OGIP standard RMF file format is described here.

How an RMF file can be extracted from a IACT 2D energy dispersion file is explained in Energy Dispersion.

Light curves

For light-curves, we recommend to store the information in an as similar format as possible to the one for Spectra and specifically SED.

For measurements at a given time, a TIME column should be added, for measurements for a given time interval, TIME_MIN and TIME_MAX columns should be added.

As explained in Time, times should be given as 64-bit floats, using the FITS time standard to specify a reference time point. Note that this allows some flexibility, and e.g. the commonly used “MJD values” for light curves are supported via these header keys:

MJDREFI =  0
MJDREFF =  0
TIMEUNIT= 'd'

Light curve producers are highly encouraged to always give the TIMESYS. For archival data this is often not given, and it’s unclear if the times are e.g. in the UTC or TT or some other TIMESYS.

This is a very preliminary description, based on the discussion here: https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/pull/61

If someone has time to provide a more detailed description, and to produce example files in FITS or possibly also ECSV format, this would be highly welcome.