_images/header.svg

Changelog

1.5.0 - 2020-05-09

  • tests: fix tests in case HOME is overridden #12
  • uri2fsn: Handle a subset of legacy UNC file URIs on Windows #14
  • expanduser: Ignore HOME env var on Windows like Python 3.8 #15

1.4.1 - 2019-12-26

  • fsn2uri(): Fix handling of surrogates with PyPy3 on Windows

1.4.0 - 2019-11-17

  • Python 3.3 support removed
  • Added type annotations

1.3.4 - 2018-01-23

  • fsn2bytes() and bytes2fsn() now default to “wtf-8” for the Windows path encoding instead of having no default.

1.3.3 - 2018-01-03

  • Restore WinXP support
  • Fix some warnings with Python 3.6

1.3.2 - 2017-11-05

  • Tests: Fix some errors with newer pytest and make the test suite work on native Windows.

1.3.1 - 2017-07-29

  • Fixed missing normalization with path2fsn() on Linux + Python 3

1.3.0 - 2017-07-28

1.2.2 - 2016-12-18

  • uri2fsn: improve error handling on unescaped URIs #4

1.2.1 - 2016-12-07

  • isinstance(path, fsnative) now checks the value as well. If True passing the instance to path2fsn will never fail.

1.2.0 - 2016-12-06

  • fsnative: safeguard against containing null bytes. All operations converting to fsnative will now fail if the result would contain null bytes. This means passing fsnative to functions like open() is now always safe.

1.1.0 - 2016-12-05

1.0.1 - 2016-10-25

  • Python 2.6 support removed
  • print_(): allow None for end, sep and file arguments
  • print_(): always output utf-8 when redirected on Windows

1.0.0 - 2016-09-09

  • First stable release

0.4.0 - 2016-09-07

  • Support paths with surrogates under Windows

0.3.0 - 2016-09-03

0.2.0 - 2016-08-25

0.1.0 - 2016-08-22

  • Initial release

Tutorial

There are various ways to create fsnative instances:

# create from unicode text
>>> senf.fsnative(u"foo")
'foo'

# create from some serialized format
>>> senf.bytes2fsn(b"foo", "utf-8")
'foo'

# create from an URI
>>> senf.uri2fsn("file:///foo")
'/foo'

# create from some Python path-like
>>> senf.path2fsn(b"foo")
'foo'

You can mix and match the fsnative type with ASCII str on all Python versions and platforms:

>>> senf.fsnative(u"foo") + "bar"
'foobar'
>>> senf.fsnative(u"foo").endswith("foo")
True
>>> "File: %s" % senf.fsnative(u"foo")
'File: foo'

Now that we have a fsnative, what can we do with it?

>>> path = senf.fsnative(u"/foo")

# We can print it
>>> senf.print_(path)
/foo

# We can convert it to text for our favorite GUI toolkit
>>> senf.fsn2text(path)
'/foo'

# We can convert it to an ASCII only URI
>>> senf.fsn2uri(path)
'file:///foo'

# We can serialize the path so we can save it somewhere
>>> senf.fsn2bytes(path, "utf-8")
b'/foo'

The functions in the stdlib usually return the same type as was passed in. If we pass in a fsnative to os.listdir, we get one back as well.

>>> files = os.listdir(senf.fsnative(u"."))
>>> isinstance(files[0], senf.fsnative)
True

In some cases the stdlib functions don’t take arguments and always return the same type. For those cases Senf provide alternative implementations.

>>> isinstance(senf.getcwd(), senf.fsnative)
True

A similar problem arises with stdlib collections. Senf provides alternatives for sys.argv and os.environ.

>>> isinstance(senf.argv[0], senf.fsnative)
True
>>> isinstance(senf.environ["PATH"], senf.fsnative)
True

Also for os.environ related functions.

>>> isinstance(senf.getenv("HOME"), fsnative)
True
>>> isinstance(senf.expanduser("~"), fsnative)
True

If you work with files a lot your unit tests will probably need temporary files. Senf provides wrappers for tempfile functions which always return a fsnative.

>>> senf.mkdtemp()
'/tmp/tmp26Daqo'
>>> isinstance(_, senf.fsnative)
True

API Documentation

Stdlib Replacements

Alternative implementations or wrappers of stdlib functions and constants. In some cases their default is changed to return an fsnative path (mkdtemp() with default arguments) or Unicode support for Windows is added (sys.argv)

environ os.environ replacement
argv sys.argv replacement
sep os.sep replacement
pathsep os.pathsep replacement
curdir os.curdir replacement
pardir os.pardir replacement
altsep os.altsep replacement
extsep os.extsep replacement
devnull os.devnull replacement
defpath os.defpath replacement
getcwd() os.getcwd replacement
getenv() os.getenv replacement
putenv() os.putenv replacement
unsetenv() os.unsetenv replacement
print_() print() replacement
input_() input() replacement
expanduser() os.path.expanduser() replacement
expandvars() os.path.expandvars() replacement
gettempdir() tempfile.gettempdir() replacement
gettempprefix() tempfile.gettempprefix() replacement
mkstemp() tempfile.mkstemp() replacement
mkdtemp() tempfile.mkdtemp() replacement

Misc Functions

supports_ansi_escape_codes() if the output file supports ANSI codes

Documentation Types

These types only exist for documentation purposes and represent different types depending on the Python version and platform used.

class senf.text

Represents unicode under Python 2 and str under Python 3. Does not include surrogates.

class senf.bytes

Represents str under Python 2 and bytes under Python 3.

class senf.pathlike

Anything the Python stdlib allows as a path. In addition to fsnative this allows

  • bytes encoded with the default file system encoding (usually mbcs) on Windows.
  • bytes under Python 3 + Unix.
  • unicode under Python 2 + Unix if it can be encoded with the default file system encoding.
  • (Python 3.6+) Instances where its type implements the __fspath__ protocol. See PEP 519 for details.

Examples

See https://github.com/quodlibet/senf/tree/master/examples for a few example programs.

_images/ls.png _images/ansi.png

Frequently Asked Questions

Are there any existing users of Senf?
It is currently used in Quod Libet and mutagen.
Why not use bytes for paths on Python 3 + Unix?

Downsides of using str: str can not be pickled as it depends on the locale encoding. You have to use something like fsn2bytes first, or you have to make sure that the encoding doesn’t change across program invocations.

Upsides of using str: str has more support in the stdlib (pathlib for example) and it can be used in combination with the string literal "foo". The later makes some_fsnative + "foo" work for all Python versions and platforms as long as it contains ASCII only.

Why the weird “foo2bar” function naming?
As the real types depend on the platform anything like “decode”/”encode” is confusing. So you end up with “a_to_b” or “a_from_b”. And imo having things always go one direction, being fast to parse visually and not being too long makes this a good choice. But ymmv.
How can it be that fsnative() can’t fail, even with an ASCII encoding?
It falls back to utf-8 if encoding fails. Raising there would make everything complicated and there is no good way to handle that error case anyway.
Why not replace sys.stdout instead of providing a new print()?
No monkey patching. Allows us to do our own error handling so print will never fail. Printing some question marks is better than a stack trace if the target is a user.

Senf introduces a new platform native string type called fsnative. It adds functions to convert text, bytes and paths to and from that new type and helper functions to integrate it nicely with the Python stdlib.

Senf supports Python 2.7, 3.3+, works with PyPy, works on Linux, Windows, macOS, is MIT licensed, and only depends on the stdlib. It does not monkey patch anything in the stdlib.

pip install senf

https://github.com/quodlibet/senf

Why?

OS strings are used in many different places across the Python stdlib. They are used for filesystem paths, for environment variables (os.environ), for program arguments (sys.argv and subprocess), for printing to the console (sys.stdout, sys.stderr) and more.

The problem with them is that they come in many shapes and forms and handling them has changed significantly between Python 2 and Python 3.

A valid platform native string is either bytes, unicode, str + surrogates (either through the surrogatepass or the surrogateescape error handler) or anything implementing the __fspath__ protocol. The values of those types depend on the Python version, the platform and the enviroment the program was started in. Ideally we don’t want to care about any of those details.


For example, assume you want to check the extension of a file name:

import os
from senf import path2fsn

def has_extension(filename, ext):
    root, filename_ext = os.path.splitext(path2fsn(filemame))
    return filename_ext == path2fsn(ext)

This will just work everywhere. path2fsn() will convert anything which is considered a valid path by Python to a fsnative and then we can just compare by value. Note that Python stdlib functions will always returns the same type which was passed in, so os.path.splitext() will return two fsnative values.


Or you want to send a filename over some binary interface:

from senf import fsnative, fsn2bytes, bytes2fsn

def send(filename):
    assert isinstance(filename, fsnative)
    data = fsn2bytes(filename, "utf-8")
    return data

def receive(data):
    filename = bytes2fsn(data, "utf-8")
    return filename

fsn2bytes() converts the path to binary (“utf-8” is used on Windows, or “wtf-8” to be exact) and the receiving end re-creates the filename with bytes2fsn().


Another example is printing filenames and text to a console:

import os
from senf import print_, argv

for filename in os.listdir(argv[1]):
    print_(u"File: ", filename)

Senf provids its own print function which can output platform strings as is and mix them with text. No more encoding/decoding errors.

In addition, Senf emulates ANSI escape sequence handling when using the Windows console and extends Python 2 under Windows with Unicode support for sys.argv and os.environ.

Who?

Senf is used by the following software:

  • Quod Libet - A multi platform music player
  • mutagen - A Python multimedia tagging library