Binary Structured Data Format

The Binary Structured Data Format (BSDF) is an open specification for serializing (scientific) data, for the purpose of storage and (inter process) communication.

It's designed to be a simple format, making it easy to implement in many programming languages. However, the format allows implementations to support powerful mechanics such as lazy loading of binary data, and streamed reading/writing.

BSDF is a binary format; by giving up on human readability, BSDF can be simple, compact and fast. See the full specification, or how it compares to other formats.

The source code is at Gitlab.

Data types and extensions

BSDF supports 8 base types: null, booleans, integers, floats, strings/text, (heterogenous) lists, mappings (i.e. dictionaries), and binary blobs. Integers and floats represent 64 bit numbers, but can be encoded using less bytes. Binary blobs can optionally be compressed (zlib or bz2), can have checksums, and can be resized.

Via an efficient extension mechanism, other data types (including custom ones), can be serialized. The standard extensions work out of the box, supporting e.g. nd-arrays and complex numbers.

Status

The format is complete, except for a few details such us how to deal with blob checksums. All implementations comply with the format and are well-tested. We could do with implementatations in additional languages though!

Implementations

Implementations currently exist for multiple languages. Each implementation is continuously tested to ensure compatibility.

We'd like implementations for other languages (such as R and Julia). BSDF is designed to be easy to implement; perhaps you want to contribute?

We aim for the implementations to have similar API's: a class whose instances hold extensions and options, and has encode(), decode(), save(),load(), and add_extension() methods. Optionally, an implementation can provide convenience functions.

There is also a command line interface that can be used to e.g. create and view BSDF files.

Installation

See the specific implementations for detailed installation instructions. Most implementations consist of a single file.

Examples

In Python:

>>> import bsdf
>>> b = bsdf.encode(['just some objects', {'foo': True, 'bar': None}, 42.001])
>>> b
b'BSDF\x02\x00l\x03s\x11just some objectsm\x02\x03fooy\x03barvd\xe3\xa5\x9b\xc4 \x00E@'
>>> len(b)
48
>>> bsdf.decode(b)
['just some objects', {'foo': True, 'bar': None}, 42.001]

See more Python examples, see the Python example notebook.

In JavaScript:

> bsdf = require('bsdf.js')
{ encode: [Function: bsdf_encode],
  decode: [Function: bsdf_decode],
  BsdfSerializer: [Function: BsdfSerializer],
  standard_extensions: ...}
> b = bsdf.encode(['just some objects', {foo: true, bar: null}, 42.001])
ArrayBuffer { byteLength: 48 }
> bsdf.decode(b)
[ 'just some objects', { foo: true, bar: null }, 42.001 ]

In Matlab / Octave:

>> bsdf = Bsdf()
>> b = bsdf.encode({'just some objects', struct('foo', true, 'bar', []), 42.001});
>> size(b)
ans =
   48    1
>> bsdf.decode(b)
ans =
{
  [1,1] = just some objects
  [1,2] =

    scalar structure containing the fields:

      foo = 1
      bar = [](0x0)

  [1,3] =  42.001
}

It is worth noting that although different languages may represent data types in slightly different ways, the underlying bytes in BSDF are the same. This makes BSDF suited for inter-language communication.

License

In principal, all implementations in the BSDF repository use the 2-clause BSD license (see LICENSE for details), unless otherwise specified. All code is liberally licensed (BSD- or MIT-like).

Contents

The BSDF Command Line Interface

BSDF has a command line interface (CLI) for performing simple tasks, such as inspecting, converting and creating BSDF files. The CLI is part of the Python implementation, so pip install bsdf to start using it.

Using the CLI

After installation, depending on your Python setup, the CLI may be available as the bsdf command. If this is the case, you can run:

$ bsdf ...

If this is not the case, or if you want to target a specific Python version, use:

$ python -m bsdf ...

Getting help

To get started, run the help command:

$ bsdf help

which yields:

Command line interface for the Binary Structured Data Format.
See http://bsdf.io for more information on BSDF.

usage: bsdf command [options]

Available commands:
  convert - Convert one format into another (e.g. JSON to BSDF).
  create  - Create a BSDF file from data obtained by evaluation Python code.
  help    - Show the help text.
  info    - Print meta information about the given BSDF file.
  version - Print the version of the current Python implementation.
  view    - View the content of a given BSDF file.

Run 'bsdf help command' or 'bsdf command --help' to learn more.

Dive deeper using e.g.

$ bsdf help view

Example

$ bsdf create foo.bsdf '["xx", 4, None, [3, 4, 5]*3]'
$ bsdf info foo.bsdf
BSDF info for: C:\dev\pylib\bsdf\python\foo.bsdf
  file_name:     foo.bsdf
  file_size:     45
  file_mtime:    2017-12-21 15:21:41
  is_valid:      true
  file_version:  2.1
$ bsdf view foo.bsdf
[ list with 4 elements
  'xx'
  4
  null
  [ list with 9 elements
    3
    4
    5
    3
    4
    5
    3
    4
    5
  ]
]
$ bsdf view foo.bsdf --depth=1
[ list with 4 elements
  'xx'
  4
  null
  [ list with 9 elements ]
]

Comparing BSDF with other formats

The question that arises with any new format: Why, oh Why? Why yet another format!?

In short, there was no format that could serialize nd-array data well, and also work well on the web. The realization that HDF5 is not so great, a strong need to send scientific data between Python and JavaScript, and a repeated annoyance with JSON has nudged me to create BSDF.

This page tries to compares BSDF with other formats, and explains why these formats were in my view insufficient for my needs.

BSDF vs JSON

Although JSON is very widely used, it has several limitations:

  • JSON's inability to encode nan and inf can be painful.
  • No support for binary data or nd-arrays (base64 is a compromise worth avoiding).
  • It's kind of human readable, but very verbose, and not easy to write (e.g. a comma after the last item in a list breaks things).
  • Many JSON implementations allow extending the types, but this involves an extra function call for each element, which degrades the performance.

BSDF vs UBJSON et al.

Binary formats commonly used on the web that were considered are ubjson, msgpack, bson. Most are rather web-oriented, or adhere strictly to JSON compatibility (e.g. no nan). Most do not support typed arrays, let alone nd-arrays, and/or decode such arrays in JavaScript as regular arrays instead of array buffers. In short; none of these seemed to provide the flexibility that a scientific data format needs.

BSDF differs from most of them by its flexibility for encoding binary data, and its simple extension mechanism.

It's worth noting that BSDF does not support typed arrays as one of its base types, but the extension for typed nd-arrays is a standard extension available in most implementations.

BSDF vs HDF5

HDF5 is a popular format for scientific data, but there are also good reasons to avoid it, as e.g. explained the paper on ASDF and this blog post. Summarizing:

  • HDF5 is a complex specification and (therefore) there is really just one implementation that actually works.
  • The implementation sometimes has bugs or performance issue, but there are no alternatives.
  • Not human readable, and no other tools for inspection except that one implementation.
  • No proper mappings (dicts) and lists.

HDF5 is certainly more flexible, e.g. with regard to providing lazy loading parts of compressed data. However, BSDF does support resizing of binary data, in-place editing, lazy loading, and streamed reading and writing.

BSDF vs ASDF

The ASDF format has goals that partly overlap with the purpose of BSDF:

  • intrinsic hierarchical structure
  • human readable
  • based on existing data format (yaml)
  • support for references (also to external objects)
  • efficient updating
  • machine independent, structured data, ndarrays
  • support for writing (and reading) streams
  • explicit versioning
  • explicit extensibility without interference
  • support for validation with schemas

ASDF was seriously considered before the development on BSDF started. The idea of a human readable format is appealing, but ...

  • Yaml is a rather ill defined format that is hard to parse, which is probably why the parser is so slow.
  • Data that consist of many elements (but not so much blobs) will be encoded inefficiently.
  • Many text editors won't deal well with huge text files.
  • If the text is edited, byte alignments are likely to break.
  • It makes the format more complex (you basically have two formats).

This is why BSDF drops human readability, gaining a format that is simple, compact, and fast to parse. This is not to say that ASDF did it wrong; it is very suited for what it was designed for. But BSDF is more suited for e.g. inter process communication.

BSDF vs Arrow

The goals of Apache Arrow bear similarities with BSDF, with e.g. a clear standard and zero copy reads. However, it's rather focussed on columnar data (where BSDF supports nd-arrays), and seems oriented at compiled languages, i.e. less flexible. Although the specification looks easy to read, the Python implementation is much larger than BSDF's 800 or so lines of code. It's also not pure Python, making it nontrivial to install on less common Python versions/implementations.

BSDF vs NPZ

Numpy has a builtin way to encode typed arrays. However, this is limited to arrays (no meta data), and rather specific to Python.

BSDF vs SSDF (and BSDF v1)

Around 2011 I developed a human readable file format called SSDF, suited for storing hierachical data, similar to JSON, but with support for nan and inf. It also supports nd-arrays, via base64 encoding and zlib compression. I've used this in several (scientific) projects (e.g. it was used in the Pyzo IDE to store config data). Although it does serve its purpose, its not terribly good for large binary datasets. I also kept coming back in need of a format to send binary data to/from JS, where compression is a problem.

At some point I developed a binary equivalent of SSDF that's fully compatible, but stored binary data more effectively. The current BSDF format can be seen as its successor, being both simpler and more extensible. This is also why BSDF's version number starts at 2.

I am currently of the opinion that a format that is good at binary data can not also be good at being a human readable (config) format. See e.g. toml for a well-readable format.

Contributing to BSDF

There are several ways that you can contribute to BSDF. From contributing bugs in the issue tracker, to providing fixes and improvements, or even contributing new implementations.

Organization of the code

Since BSDF is designed to be simple, implementations are usually restricted to a single module. The BSDF Gitlab repo contains implementations for several languages, organized in sub directories. This allows testing each implementation using a "test service", and ensures compatibility between the different implementations.

Development dependencies

The tooling around BSDF is implemented in Python. For development, you need Python 3.x and the invoke library (pip install invoke).

To run tasks such as tests, run invoke from the root repo to get started.

Workflow

To start contributing an enhancement or new implementation, please start by making an issue to start the discussion. The actual code will be contributed via pull requests.

It is expected that each implementation will be more or less maintained by its own group of contributors.

Code of conduct

BSDF does not have an official code of conduct yet, but let's just say that we expect respect from and towards all contributors, and will not tolerate discrimination or trolling.

Extending BSDF

BSDF can encode special kinds of data by providing the serializer with extensions. How users specify extensions is specific to the implementation, but they will typically consist of 4 elements:

  • A name to identify it with. This will be encoded along with the data, so better keep it short, although custom extensions are best prefixed with the context (e.g. 'mylibrary.myextension'), to avoid name clashes.
  • A type and/or a match function, so that the BSDF encoder can determine what objects must be serialized.
  • An encoder function to convert the special object to more basic objects.
  • A decoder function to reconstruct the special object from the basic objects.

How it works

Extensions encode a high level data types into more basic data types, such as the base BSDF types, or types supported by other extensions. Upon decoding, the extension reconstructs the high level data from the "lower level" data. When an extension is not available during decoding, a warning is produced, and the object is represented in its underlying basic form.

Extensions add very little overhead in speed (unlike e.g. JSON). In terms of memory, each object being converted needs a little extra memory to encode the extension's name.

Kinds of extensions

Everyone can write their own extension and use it in their own work.

The purpose of this document is to specify ways to convert common data types, and how these extensions should be named. If everyone adheres to these specifcations, data will be easier to share.

BSDF also defines a small set of standard extensions, which users are stongly encouraged to follow, and which all BSDF implementations are encouraged to support by default.

Status

This is a work in progress and the specifications below are subject to change. The standardization of a base set of extensions should settle soonish after the BSDF format itself has stabilized.

Standard extensions

Complex numbers

  • name: "c"
  • encoding: a list with two elements (the real and the imaginary part).

N-dimensional arrays

  • name: "ndarray"
  • encoding: a dict with elements:
    • 'dtype', a string that specifies the data type. Minimal support should be 'uint8', 'int8', 'uint16', 'int16', 'uint32', 'int32', 'float32', and preferably 'uint64', 'int64' and 'float64'.
    • 'shape', a list with as many elements (integers) as the array has dimensions. The first changing dimension first.
    • 'data', a blob of bytes representing the contiguous data.

We might add an "order" field at a later point. This will need to be investigated/discussed further. Until then, C-order (row-major) should be assumed where it matters.

Other extensions

2D image data

  • name: 'image2d'
  • encoding: a dict with elements:
    • array: an ndarray with 2 or 3 dimensions
    • meta: a dict with arbitrary data

If the data is 3D, the 3d dimension represents the color channels (1: L, 2: LA , 3: RGB or 4: RGBA).

3D image data

  • name: 'image3d'
  • encoding: a dict with elements:
    • array: an ndarray with 3 or 4 dimensions
    • meta: a dict with arbitrary data

If the data is 4D, the 3d dimension represents the color channels (1: L, 2: LA , 3: RGB or 4: RGBA).

The BSDF format specification

This document applies to BSDF format VERSION = 2.2.

Purpose and features

The purpose of BSDF is to provide a data format that is ...

  • easy to implement, such that it can easily spread to other programming languages.
  • suitable for working with binary (scientific) data.
  • suitable for inter process communication and the web.

This has resulted in the following features:

  • A binary format that has a simple specification.
  • Language agnostic and machine independent.
  • Compact storage.
  • Fast encoding and decoding. E.g. the pure Python implementation has a respectable speed, and can be made faster via e.g. a C implementation.
  • Support for binary blobs, in uncompressed format or compression with zlib or bz2.
  • Uses data types that are widely supported in most languages.
  • Provides a mechanism to easily convert to/from special data types, with minimal effect on performance, also accross languages.
  • Data can be read and written without seek operations (e.g. to allow (streamed) reading from remote resources).
  • Zero copy reads (in uncompressed data, bytes are aligned).
  • Implementations can provide direct access to blobs via a file-like object for lazy loading or efficient updating.
  • Provides a way to stream data (e.g. as a list at the end of the file that can simply be appended to).

Also see how BSDF compares to other formats.

Minimal implementation

A minimal BSDF implementation must support:

  • the basic data types: null, bool, int, float, string, list, mapping, and uncompressed binary blobs.
  • reading (closed and unclosed) streams (at the end of a data structure).
  • preferably most standard extensions.

Implementations are encouraged to support:

  • user-defined extensions.
  • compressed binary blobs (zlib and bz2).

Further, implementations can be made more powerful by supporting:

  • Lazy loading of blobs.
  • Editing of (uncompressed) blobs.
  • Lazy loading of streams.
  • Deferred writing of streams.

The format

Each data value is identified using a 1 byte character in the ASCII range. If this identifier is a capital letter (smaller than ASCII 95), it means that it's a value to be converted via an extension. If so, the next item is a string (see below for its encoding) representing the extension name. Next is the data itself. All words are stored in little endian format.

Encoding of size

Sizes (of e.g. lists, mappings, strings, and blobs) are encoded as follows: if the size is smaller than 251, a single byte (uint8) is used. Otherwise, the first byte is 253, and the next 8 bytes represent the size using an unsigned 64bit integer. (The bytes 254 and 255 are used to identify (closed and unclosed) streams, and 251-252 are reserved.)

null

The value null/nil/none is identified by v (for void), and has no data.

booleans

The values false and true are identified by n for no, and y for yes, respectively. These values have no data.

integers

Integer values come it two flavours:

  • h: small values (between -32768 and 32768, inclusive) can be encoded using int16.
  • i: int64
floats

Floats values follow the IEEE 754 standard, can be NaN and inf and come in two flavours:

  • f: a 32bit float
  • d: a 64bit float
strings

String values are identified by s (for string), and consists of a size item (1 or 9 bytes), followed by the bytes that represents the UTF-8 encoded string.

blobs

Binary data is encoded as follows:

  • char b (for blob)
  • uint8 value indicating the compression. 0 means no compression, 1 means zlib, 2 means bz2.
  • allocated_size: the amount of space allocated for the blob, in bytes.
  • used_size: the amount of used space for the blob, in bytes.
  • data_size: the size of the blob when decompressed, in bytes. If compression is off, it must be equal to used_size.
  • checksum: a single byte 0x00 means no hash, a byte 0xFF means that there is, and is followed by a 16-byte md5 hash of the used (compressed) bytes.
  • Byte alignment indicator: a uint8 number indicating the number of bytes to skip before the data starts. Implementations must align the data to 8-byte boundaries, but larger boundaries (up to 256) are allowed.
  • Empty space: a number of empty bytes, as indicated by the byte alignment indicator.
  • The binary blob, used_size bytes.
  • Empty space, allocated_size minus used_size bytes. This space may have been caused by a reducion of size of the blob, or may be allocated to allow increasing the size of the blob.

Note: at this moment, some implementations can write checksums, but none actually use it to validate the data. A policy w.r.t. checksums will have to be made and implementations will have to implement this.

lists

List values consist of the identifier l (for list), followed by a size item that represents the length of the list n. After that, n values follow, which can be of any type.

mappings

Mappings, a.k.a. dictionaries or structs, consists of the identifier m (for mapping), followed by a size item that represents the length of the mapping n. After that, n items follow, each time a combination of a string (the key) and the value.

Streaming

Streams allow data to be written and read in a "lazy" fashion. Implementations are not required to support streaming itself, but must be able to read data that contains (closed and unclosed) streams.

Data that is "streaming" must always be the last object in the file (except for its sub items). BSDF currently specifies that streaming is only supported for lists. It will likely also be supported for blobs.

Streams are identified by the size encoding which starts with 254 or 255, followed by an unsigned 64 bit integer. For closed streams (254), the integer represents the number of items in the stream. For unclosed streams (255) the 64 bit integer must be ignored.

Encoder implementations can thus close a stream by changing the 255 to 254 and writing the real size in the next 8 bytes. Alternatively, an implementaion can turn it into a regular encoded list (not streamed) by writing 253 instead. Note that in the latter case the list can not be read as a stream anymore.

BSDF Javascript implementation

This implementation of BSDF is intended for use in NodeJS or the browser. It is a "lite" implementation, without support for e.g. lazy loading or streaming.

Installation

Include bsdf.js in your project.

Usage

Basic usage:

var bsdf = require('bsdf.js');
var data1 = ...
var bytes = bsdf.encode(data1);  // produces an ArrayBuffer
var data2 = bsdf.decode(bytes);  // bytes can be ArrayBuffer, DataView or Uint8Array.

Full example using extensions:

// A class that we want to encode
function MyOb(val) {
    this.val = val;
}
// The extension that can encode/decode it
var myext = {name: 'test.myob',
             match: function (v) { return v instanceof MyOb; },
             encode: function (v) { return v.val; },
             decode: function (v) { return new MyOb(v); }
             };
// Determine extensions to use (include standard ones)
var extensions = Array.concat(bsdf.standard_extensions, [myext]);
// Encode and decode
var data1 = new MyOb(42);
var bytes = bsdf.encode(data1, extensions);
var data2 = bsdf.decode(bytes);  // -> the raw value, 42
var data3 = bsdf.decode(bytes, extensions);  // a MyOb instance with value 42

Reference:

Function encode(data, extensions)

Encode the data, using the provided extensions (or the standard extensions if not given). Returns an ArrayBuffer representing the encoded data. See BsdfSerializer.encode() for details.

Function decode(blob, extensions)

Decode the blob, using the provided extensions (or the standard extensions if not given). Returns the decoded data. See BsdfSerializer.decode() for details.

Class BsdfSerializer(extensions)

Provides a BSDF serializer object with a particular set of extension.

Method add_extension(extension)

Add an extension object to the the serializer.

Method remove_extension(extension)

Remove an extension instance (and any extension with the same name).

Method encode(data)

Encode the data and returns an ArrayBuffer representing the encoded data.

Any ArrayBuffer and DataView objects present in the data are interpreted as byte blobs, while Uint8Array objects are interpreted as typed arrays.

Method decode(blob)

Decode the blob and returns the decoded data.

Any encoded byte blobs are will be represented using DataView objects that provide a view (not a copy) on the input data. These can be mapped to an array with e.g. a = new Uint8Array(bytes.buffer, bytes.byteOffset, bytes.byteLength). If needed, a copy can be made with a = new Uint8Array(a).

Extensions

Extensions are represented by objects that have the following attributes:

  • a string name indicating the identifier of the extension.
  • a function match(s, v) that is called with a serializer object and a value, and should return true if the extension should be used.
  • a function encode(s, v) that converts a value to more primitive objects.
  • a function decode(s, v) that converts primitive objects into the intended form.

BSDF Matlab/Octave implementation

This is the implementation of the BSDF format for Matlab/Octave. It's in good shape and well tested. Though it could do with some love from a Matlab expert to optimize the code and/or improve the implementation, e.g. by allowing custom extensions.

Installation

Download Bsdf.m and place it in a directory where Matlab can find it, e.g. by doing:

addpath('/path/to/bsdf');

Usage

Functionality is provided via a single Bsdf class:

>> bsdf = Bsdf()
>> b = bsdf.encode({'just some objects', struct('foo', true, 'bar', []), 42.001});
>> size(b)
ans =
   48    1
>> bsdf.decode(b)
ans =
{
  [1,1] = just some objects
  [1,2] =

    scalar structure containing the fields:

      foo = 1
      bar = [](0x0)

  [1,3] =  42.001
}

Reference:

Class Bsdf()

This class represents the main API to use BSDF in Matlab.

Options (for writing) are provided as object properties:

  • compression: the compression for binary blobs, 0 for raw, 1 for zlib (not available in Octave).
  • float64: whether to export floats as 64 bit (default) or 32 bit.
  • use_checksum: whether to write checksums for binary blobs, not yet implemented.
Method save(filename, data)

Save data to a file.

Method load(filename)

Load data from a file.

Method encode(data)

Serialize data to bytes. Returns a blob of bytes (a uint8 array).

Method decode(blob)

Load data from bytes.

BSDF Python implementation

This is the reference implementation of BSDF, with support for streamed reading and writing, and lazy loading of binary blobs. See also the minimal version of BSDF in Python.

Installation

Installing via pip will install bsdf.py as well as the CLI:

$ pip install bsdf

Alternatively, one can copy bsdf.py to a directory on your PYTHONPATH. Copy bsdf_cli.py along to be able to use the CLI.

There are no dependencies except Python 2.7 or Python 3.4+.

Usage

Simple use:

import bsdf

# Encode
bb = bsdf.encode(my_object)

# Decode
my_object2 = bsdf.decode(bb)

Example advanced use:

import bsdf

class MyFunctionExtension(bsdf.Extension):
    """ An extension that can encode function objects and reload them if the
    function is in the global scope.
    """
    name = 'my.func'
    def match(self, s, f):
        return callable(f)
    def encode(self, s, f):
        return f.__name__
    def decode(self, s, name):
        return globals()[name]  # in reality, one would do a smarter lookup here

# Setup a serializer with extensions and options
serializer = bsdf.BsdfSerializer([MyFunctionExtension],
                                 compression='bz2')

def foo():
    print(42)

# Use it
bb = serializer.encode(foo)
foo2 = serializer.decode(bb)

foo2()  # print 42

For more examples, see the Python example notebook.

Reference:

function encode(ob, extensions=None, **options)

Save (BSDF-encode) the given object to bytes. See BSDFSerializer for details on extensions and options.

function decode(bb, extensions=None, **options)

Load a (BSDF-encoded) structure from bytes. See BSDFSerializer for details on extensions and options.

function save(f, ob, extensions=None, **options)

Save (BSDF-encode) the given object to the given filename or file object. SeeBSDFSerializer for details on extensions and options.

function load(f, extensions=None, **options)

Load a (BSDF-encoded) structure from the given filename or file object. See BSDFSerializer for details on extensions and options.

class BsdfSerializer(extensions=None, **options)

Instances of this class represent a BSDF encoder/decoder.

It acts as a placeholder for a set of extensions and encoding/decoding options. Use this to predefine extensions and options for high performance encoding/decoding. For general use, see the functions save(), encode(), load(), and decode().

This implementation of BSDF supports streaming lists (keep adding to a list after writing the main file), lazy loading of blobs, and in-place editing of blobs (for streams opened with a+).

Options for encoding:

  • compression (int or str): 0 or "no" for no compression (default), 1 or "zlib" for Zlib compression (same as zip files and PNG), and 2 or "bz2" for Bz2 compression (more compact but slower writing). Note that some BSDF implementations (e.g. JavaScript) may not support compression.
  • use_checksum (bool): whether to include a checksum with binary blobs.
  • float64 (bool): Whether to write floats as 64 bit (default) or 32 bit.

Options for decoding:

  • load_streaming (bool): if True, and the final object in the structure was a stream, will make it available as a stream in the decoded object.
  • lazy_blob (bool): if True, bytes are represented as Blob objects that can be used to lazily access the data, and also overwrite the data if the file is open in a+ mode.
method add_extension(extension_class)

Add an extension to this serializer instance, which must be a subclass of Extension. Can be used as a decorator.

method remove_extension(name)

Remove a converted by its unique name.

method encode(ob)

Save the given object to bytes.

method save(f, ob)

Write the given object to the given file object.

method decode(bb)

Load the data structure that is BSDF-encoded in the given bytes.

method load(f)

Load a BSDF-encoded object from the given file object.

class Extension()

Base class to implement BSDF extensions for special data types.

Extension classes are provided to the BSDF serializer, which instantiates the class. That way, the extension can be somewhat dynamic: e.g. the NDArrayExtension exposes the ndarray class only when numpy is imported.

A extension instance must have two attributes. These can be attribiutes of the class, or of the instance set in __init__():

  • name (str): the name by which encoded values will be identified.
  • cls (type): the type (or list of types) to match values with. This is optional, but it makes the encoder select extensions faster.

Further, it needs 3 methods:

  • match(serializer, value) -> bool: return whether the extension can convert the given value. The default is isinstance(value, self.cls).
  • encode(serializer, value) -> encoded_value: the function to encode a value to more basic data types.
  • decode(serializer, encoded_value) -> value: the function to decode an encoded value back to its intended representation.

class ListStream(mode='w')

A streamable list object used for writing or reading. In read mode, it can also be iterated over.

method append(item)

Append an item to the streaming list. The object is immediately serialized and written to the underlying file.

method close(unstream=False)

Close the stream, marking the number of written elements. New elements may still be appended, but they won't be read during decoding. If unstream is False, the stream is turned into a regular list (not streaming).

method next()

Read and return the next element in the streaming list. Raises StopIteration if the stream is exhausted.

class Blob(bb, compression=0, extra_size=0, use_checksum=False)

Object to represent a blob of bytes. When used to write a BSDF file, it's a wrapper for bytes plus properties such as what compression to apply. When used to read a BSDF file, it can be used to read the data lazily, and also modify the data if reading in 'r+' mode and the blob isn't compressed.

method seek(p)

Seek to the given position (relative to the blob start).

method tell()

Get the current file pointer position (relative to the blob start).

method write(bb)

Write bytes to the blob.

method read(n)

Read n bytes from the blob.

method get_bytes()

Get the contents of the blob as bytes.

method update_checksum()

Reset the blob's checksum if present. Call this after modifying the data.

BSDF Python lite implementation

This is a lightweight implementation of BSDF in Python. Fully functional (including support for custom extensions) but no fancy features like lazy loading or streaming. With less than 500 lines of code (including docstrings) this demonstrates how simple a BSDF implementation can be. See also the complete version of BSDF in Python.

Installation

Copy bsdf_lite.py to a place where Python can find it. There are no dependencies except Python 3.4+.

Usage

import bsdf_lite

# Setup a serializer with extensions and options
serializer = bsdf_lite.BsdfLiteSerializer(compression='bz2')

# Use it
bb = serializer.encode(my_object1)
my_object2 = serializer.decode(bb)

Reference:

class BsdfLiteSerializer(extensions=None, **options)

Instances of this class represent a BSDF encoder/decoder.

This is a lite variant of the Python BSDF serializer. It does not support lazy loading or streaming, but is otherwise fully functional, including support for custom extensions.

It acts as a placeholder for a set of extensions and encoding/decoding options. Options for encoding:

  • compression (int or str): 0 or "no" for no compression (default), 1 or "zlib" for Zlib compression (same as zip files and PNG), and 2 or "bz2" for Bz2 compression (more compact but slower writing). Note that some BSDF implementations (e.g. JavaScript) may not support compression.
  • use_checksum (bool): whether to include a checksum with binary blobs.
  • float64 (bool): Whether to write floats as 64 bit (default) or 32 bit.
method add_extension(extension_class)

Add an extension to this serializer instance, which must be a subclass of Extension. Can be used as a decorator.

method remove_extension(name)

Remove a converted by its unique name.

method encode(ob)

Save the given object to bytes.

method save(f, ob)

Write the given object to the given file object.

method decode(bb)

Load the data structure that is BSDF-encoded in the given bytes.

method load(f)

Load a BSDF-encoded object from the given file object.

class Extension()

Base class to implement BSDF extensions for special data types.

Extension classes are provided to the BSDF serializer, which instantiates the class. That way, the extension can be somewhat dynamic: e.g. the NDArrayExtension exposes the ndarray class only when numpy is imported.

A extension instance must have two attributes. These can be attribiutes of the class, or of the instance set in __init__():

  • name (str): the name by which encoded values will be identified.
  • cls (type): the type (or list of types) to match values with. This is optional, but it makes the encoder select extensions faster.

Further, it needs 3 methods:

  • match(serializer, value) -> bool: return whether the extension can convert the given value. The default is isinstance(value, self.cls).
  • encode(serializer, value) -> encoded_value: the function to encode a value to more basic data types.
  • decode(serializer, encoded_value) -> value: the function to decode an encoded value back to its intended representation.