Features

Latest PyPI Version License Supported Python Versions Format Readthedocs

Travis Coveralls

Features is a simple implementation of feature set algebra in Python.

Linguistic analyses commonly use sets of binary or privative features to refer to different groups of linguistic objects: for example a group of phonemes that share some phonological features like [-consonantal, +high] or a set of morphemes that occur in context of a specific person/number combination like [-participant, GROUP]. Usually, the features are applied in a way such that only some of their combinations are valid, while others are impossible (i.e. refer to no object) – for example [+high, +low], or [-participant, +speaker].

With this package, such feature systems can be defined with a simple contingency table definition (feature matrix) and stored under a section name in a simple clear-text configuration file. Each feature system can then be loaded by its name and provides its own FeatureSet subclass that implements all comparisons and operations between its feature sets according to the given definition (compatibility, entailment, intersection, unification, etc.).

Features creates the complete lattice structure between the possible feature sets of each feature system and lets you navigate and visualize their relations using the Graphviz graph layout software.

Installation

This package runs under Python 2.7 and 3.3+, use pip to install:

$ pip install features

This will also install the concepts package from PyPI providing the Formal Concept Analysis (FCA) algorithms on which this package is based.

Quickstart

Load a predefined feature system by name (in this case features for a six-way person/number distinction, cf. the definitions in the bundled config.ini in the source repository).

>>> import features

>>> fs = features.FeatureSystem('plural')

>>> print(fs.context)  
<Context object mapping 6 objects to 10 properties at 0x...>
      |+1|-1|+2|-2|+3|-3|+sg|+pl|-sg|-pl|
    1s|X |  |  |X |  |X |X  |   |   |X  |
    1p|X |  |  |X |  |X |   |X  |X  |   |
    2s|  |X |X |  |  |X |X  |   |   |X  |
    2p|  |X |X |  |  |X |   |X  |X  |   |
    3s|  |X |  |X |X |  |X  |   |   |X  |
    3p|  |X |  |X |X |  |   |X  |X  |   |

Create feature sets from strings or string sequences. Use feature string parsing, get back string sequences and feature or extent strings in their canonical order (definition order):

>>> fs('+1 +sg'), fs(['+2', '+2', '+sg']), fs(['+sg', '+3'])
(FeatureSet('+1 +sg'), FeatureSet('+2 +sg'), FeatureSet('+3 +sg'))

>>> fs('SG1').concept.intent
('+1', '-2', '-3', '+sg', '-pl')

>>> fs('1').string, fs('1').string_maximal, fs('1').string_extent
('+1', '+1 -2 -3', '1s 1p')

Use feature algebra: intersection (join) , union/unification (meet), set inclusion (extension/subsumption). Do feature set comparisons (logical connectives).

>>> fs('+1 +sg') % fs('+2 +sg')
FeatureSet('-3 +sg')

>>> fs('-3') ^ fs('+1') ^ fs('-pl')
FeatureSet('+1 +sg')

>>> fs('+3') > fs('-1') and fs('+pl') < fs('+2 -sg')
True

>>> fs('+1').incompatible_with(fs('+3')) and fs('+sg').complement_of(fs('+pl'))
True

Navigate the created subsumption lattice (Hasse graph) of all valid feature sets:

>>> fs('+1').upper_neighbors, fs('+1').lower_neighbors
([FeatureSet('-3'), FeatureSet('-2')], [FeatureSet('+1 +sg'), FeatureSet('+1 +pl')])

>>> fs('+1').upset()
[FeatureSet('+1'), FeatureSet('-3'), FeatureSet('-2'), FeatureSet('')]

>>> for f in fs:  
...     print('[%s] <-> {%s}' % (f.string_maximal, f.string_extent))
[+1 -1 +2 -2 +3 -3 +sg +pl -sg -pl] <-> {}
[+1 -2 -3 +sg -pl] <-> {1s}
...
[-1] <-> {2s 2p 3s 3p}
[] <-> {1s 1p 2s 2p 3s 3p}

See the docs on how to define, load, and use your own feature systems.

See also

  • concepts – Formal Concept Analysis with Python
  • fileconfig – Config file sections as objects
  • graphviz – Simple Python interface for Graphviz

License

Features is distributed under the MIT license.

User Guide

User Guide

Installation

features is a pure-python package implementing feature set algebra as commonly used in linguistics. It runs under both Python 2.7 and 3.3+ and is available from PyPI. To install it using pip, run the following command:

$ pip install features

For a system-wide install, this typically requires administrator access. For an isolated installation, you can run the same inside a virtualenv or a venv (Python 3.3+ only).

The pip-command will automatically download and install the (pure-python) fileconfig and concepts packages (plus dependencies) from PyPI. The latter provides the lower level Formal Concept Analysis (FCA) algorithms on which features is based.

Features is essentially a convenience wrapper around the FCA-functionality of concepts.

Feature systems

Features includes some predefined feature systems that you can try out immediately and will be used as example in this documentation. See below on how to define, persist, and load you own feature systems/definitions. To load a feature system, pass its name to features.FeatureSystem:

>>> import features

>>> fs = features.FeatureSystem('plural')

>>> fs
<FeatureSystem('plural') of 6 atoms 22 featuresets>

The built-in feature systems are defined in the config.ini file in the package directory (usually, this will be Lib/site-packages/concepts in your Python directory). You can either directly define new systems within a Python script or create your own INI-file(s) with definitions so that you can load and reuse feature systems in different scripts.

The definition of a feature system is stored in its context object. It is basically a cross-table giving the features (properties) for each thing to be described (object):

>>> print(fs.context)  
<Context object mapping 6 objects to 10 properties at 0x...>
      |+1|-1|+2|-2|+3|-3|+sg|+pl|-sg|-pl|
    1s|X |  |  |X |  |X |X  |   |   |X  |
    1p|X |  |  |X |  |X |   |X  |X  |   |
    2s|  |X |X |  |  |X |X  |   |   |X  |
    2p|  |X |X |  |  |X |   |X  |X  |   |
    3s|  |X |  |X |X |  |X  |   |   |X  |
    3p|  |X |  |X |X |  |   |X  |X  |   |

>>> fs.context.objects
('1s', '1p', '2s', '2p', '3s', '3p')

>>> fs.context.properties
('+1', '-1', '+2', '-2', '+3', '-3', '+sg', '+pl', '-sg', '-pl')

>>> fs.context.bools  
[(True, False, False, True, False, True, True, False, False, True),
 (True, False, False, True, False, True, False, True, True, False),
 (False, True, True, False, False, True, True, False, False, True),
 (False, True, True, False, False, True, False, True, True, False),
 (False, True, False, True, True, False, True, False, False, True),
 (False, True, False, True, True, False, False, True, True, False)]

In other words, it provides a mapping from objects to features and vice versa. Check the documentation of the concepts package for further information on its full functionality.

>>> fs.context.intension(['1s', '1p'])  # common features?
('+1', '-2', '-3')

>>> fs.context.extension(['-3', '+sg'])  # common objects?
('1s', '2s')

Feature sets

All feature system contain a contradicting feature set with all features that refers to no object:

>>> fs.infimum
FeatureSet('+1 -1 +2 -2 +3 -3 +sg +pl -sg -pl')

>>> fs.infimum.concept.extent
()

As well as a maximally general tautological feature set with no features referring to all objects:

>>> fs.supremum
FeatureSet('')

>>> fs.supremum.concept.extent
('1s', '1p', '2s', '2p', '3s', '3p')

Use the feature system to iterate over all defined feature sets in shortlex extent order:

>>> for f in fs:
...     print('%s %s' % (f, f.concept.extent))
[+1 -1 +2 -2 +3 -3 +sg +pl -sg -pl] ()
[+1 +sg] ('1s',)
[+1 +pl] ('1p',)
[+2 +sg] ('2s',)
[+2 +pl] ('2p',)
[+3 +sg] ('3s',)
[+3 +pl] ('3p',)
[+1] ('1s', '1p')
[-3 +sg] ('1s', '2s')
[-2 +sg] ('1s', '3s')
[-3 +pl] ('1p', '2p')
[-2 +pl] ('1p', '3p')
[+2] ('2s', '2p')
[-1 +sg] ('2s', '3s')
[-1 +pl] ('2p', '3p')
[+3] ('3s', '3p')
[+sg] ('1s', '2s', '3s')
[+pl] ('1p', '2p', '3p')
[-3] ('1s', '1p', '2s', '2p')
[-2] ('1s', '1p', '3s', '3p')
[-1] ('2s', '2p', '3s', '3p')
[] ('1s', '1p', '2s', '2p', '3s', '3p')

The string representations will show the smallest possible notation for each feature set by default (shortlex minimum). The full representation is also available (and an extent-based representation):

>>> fs('1sg').string
'+1 +sg'

>>> fs('1sg').string_maximal
'+1 -2 -3 +sg -pl'

>>> fs('1sg').string_extent
'1s'

To use the maximal representation for __str__(), put str_maximal = true into the configuration file section (see below).

Retrieval

You can call the feature system with an iterable of features to retrieve one of its feature sets:

>>> fs(['+1', '+sg'])
FeatureSet('+1 +sg')

Usually, it is more convenient to let the system extract the features from a string:

>>> fs('+1 +sg')
FeatureSet('+1 +sg')

Leading plusses can be omitted. Spaces are optional. Case, order, and duplication of features are ignored.

>>> fs('2 pl')
FeatureSet('+2 +pl')

>>> fs('SG3sg')
FeatureSet('+3 +sg')

Note that commas are not allowed inside the string.

Uniqueness

Feature sets are singletons. The constructor is also idempotent:

>>> fs('1sg') is fs('1sg')
True

>>> fs(fs('1sg')) is fs('1sg')
True

All different possible ways to notate a feature set map to the same instance:

>>> fs('+1 -2 -3 -sg +pl') is fs('1pl')
True

>>> fs('+sg') is fs('-pl')
True

Notations are equivalent, when they refer to the same set of objects (have the same extent).

Comparisons

Compatibility tests:

>>> fs('+1').incompatible_with(fs('+3'))
True

>>> fs('sg').complement_of(fs('pl'))
True

>>> fs('-1').subcontrary_with(fs('-2'))
True

>>> fs('+1').orthogonal_to(fs('+sg'))
True

Set inclusion (subsumption):

>>> fs('') < fs('-3') <= fs('-3') < fs('+1') < fs('1sg')
True

Operations

Intersection (join, generalization, closest feature set that subsumes the given ones):

>>> fs('1sg') % fs('2sg')  # common features, or?
FeatureSet('-3 +sg')

Intersect an iterable of feature sets:

>>> fs.join([fs('+1'), fs('+2'), fs('1sg')])
FeatureSet('-3')

Union (meet, unification, closest feature set that implies the given ones):

>>> fs('-1') ^ fs('-2')  # commbined features, and?
FeatureSet('+3')

Unify an iterable of feature sets:

>>> fs.meet([fs('+1'), fs('+sg'), fs('-3')])
FeatureSet('+1 +sg')

Relations

Immediately implied/subsumed neighbors.

>>> fs('+1').upper_neighbors
[FeatureSet('-3'), FeatureSet('-2')]

>>> fs('+1').lower_neighbors
[FeatureSet('+1 +sg'), FeatureSet('+1 +pl')]

Complete set of implied/subsumed neighbors.

>>> list(fs('+1').upset())
[FeatureSet('+1'), FeatureSet('-3'), FeatureSet('-2'), FeatureSet('')]

>>> list(fs('+1').downset())  
[FeatureSet('+1'),
 FeatureSet('+1 +sg'), FeatureSet('+1 +pl'),
 FeatureSet('+1 -1 +2 -2 +3 -3 +sg +pl -sg -pl')]

Definition

If you do not need to save your definition, you can directly create a system from an ASCII-art style table:

>>> fs = features.make_features('''
...      |+male|-male|+adult|-adult|
... man  |  X  |     |   X  |      |
... woman|     |  X  |   X  |      |
... boy  |  X  |     |      |   X  |
... girl |     |  X  |      |   X  |
... ''', str_maximal=False)

>>> fs  
<FeatureSystem object of 4 atoms 10 featuresets at 0x...>

>>> for f in fs:
...     print('%s %s' % (f, f.concept.extent))
[+male -male +adult -adult] ()
[+male +adult] ('man',)
[-male +adult] ('woman',)
[+male -adult] ('boy',)
[-male -adult] ('girl',)
[+adult] ('man', 'woman')
[+male] ('man', 'boy')
[-male] ('woman', 'girl')
[-adult] ('boy', 'girl')
[] ('man', 'woman', 'boy', 'girl')

Note that the strings representing the objects and features need to be disjoint and features cannot be in substring relation.

To load feature systems by name, create an INI-file with your configurations, for example:

# phonemes.ini - define distinctive features

[vowels]
description = Distinctive vowel place features
str_maximal = true
context =
   |+high|-high|+low|-low|+back|-back|+round|-round|
  i|  X  |     |    |  X |     |  X  |      |   X  |
  y|  X  |     |    |  X |     |  X  |   X  |      |
  ?|  X  |     |    |  X |  X  |     |      |   X  |
  u|  X  |     |    |  X |  X  |     |   X  |      |
  e|     |  X  |    |  X |     |  X  |      |   X  |
  ø|     |  X  |    |  X |     |  X  |   X  |      |
  ?|     |  X  |    |  X |  X  |     |      |   X  |
  o|     |  X  |    |  X |  X  |     |   X  |      |
  æ|     |  X  |  X |    |     |  X  |      |   X  |
  œ|     |  X  |  X |    |     |  X  |   X  |      |
  ?|     |  X  |  X |    |  X  |     |      |   X  |
  ?|     |  X  |  X |    |  X  |     |   X  |      |

Add your config file, overriding existing sections with the same name:

>>> features.add_config('examples/phonemes.ini')

If the filename is relative, it is resolved relative to the file where the add_config() function was called. Check the documentation of the fileconfig package for details.

Load your feature system:

>>> fs = features.FeatureSystem('vowels')

>>> fs
<FeatureSystem('vowels') of 12 atoms 55 featuresets>

Retrieve feature sets, extents and intents:

>>> print(fs('+high'))
[+high -low]

>>> print('high round = %s, %s' % fs('high round').concept.extent)
high round = y, u

>>> print('i, e, o = %s' % fs.lattice[('i', 'e', 'o')].intent)
i, e, o = -low

Logical relations between feature pairs (excluding orthogonal pairs):

>>> print(fs.context.relations())  
+high  complement   -high
+low   complement   -low
+back  complement   -back
+round complement   -round
+high  incompatible +low
+high  implication  -low
+low   implication  -high
-high  subcontrary  -low

Advanced Usage

Advanced Usage

Visualization

Create a graph of the feature system lattice.

>>> import features

>>> fs = features.FeatureSystem('plural')

>>> dot = fs.graphviz()

>>> print(dot.source)  
// <FeatureSystem('plural') of 6 atoms 22 featuresets>
digraph plural {
    graph [margin=0]
    edge [arrowtail=none dir=back penwidth=.5]
            f0 [label="+1 &minus;1 +2 &minus;2 +3 &minus;3 +sg +pl &minus;sg &minus;pl"]
            f1 [label="+1 +sg"]
                    f1 -> f0
            f2 [label="+1 +pl"]
                    f2 -> f0
...
_images/fs-plural.svg

Check the documentation of the Python graphviz interface used for details on the resulting object.

Customization

To customize the behavior of the feature sets, override the FeatureSet class-attribute of FeatureSystem with a subclass that implements your wanted features:

>>> class MyFeatures(features.FeatureSystem.FeatureSet):
...     @property
...     def features(self):
...         return list(self.concept.intent)

>>> class MyFeatureSystem(features.FeatureSystem):
...     FeatureSet = MyFeatures

>>> myfs = MyFeatureSystem('small')

>>> myfs('1 -pl')
MyFeatures('+1 -pl')

>>> myfs('1 -pl').features
['+1', '-2', '-pl']

Examples

Examples

Common features from tokens

In this example, we will crate a paradigm for the present and past tense forms of the English copula to be (tokens) and compute the common features for all different word forms (types).

Define a feature system with the meanings for the paradigm cells.

>>> import features

>>> context = '''
...         |+1|-1|+2|-2|+3|-3|+sg|+pl|+pst|-pst|
... 1sg.pres| X|  |  | X|  | X|  X|   |    |   X|
... 1pl.pres| X|  |  | X|  | X|   |  X|    |   X|
... 2sg.pres|  | X| X|  |  | X|  X|   |    |   X|
... 2pl.pres|  | X| X|  |  | X|   |  X|    |   X|
... 3sg.pres|  | X|  | X| X|  |  X|   |    |   X|
... 3pl.pres|  | X|  | X| X|  |   |  X|    |   X|
... 1sg.past| X|  |  | X|  | X|  X|   |   X|    |
... 1pl.past| X|  |  | X|  | X|   |  X|   X|    |
... 2sg.past|  | X| X|  |  | X|  X|   |   X|    |
... 2pl.past|  | X| X|  |  | X|   |  X|   X|    |
... 3sg.past|  | X|  | X| X|  |  X|   |   X|    |
... 3pl.past|  | X|  | X| X|  |   |  X|   X|    |'''

>>> fs = features.make_features(context)

>>> cellmeanings = fs.atoms

Enter the word forms for each cell.

>>> cellforms = [
...     'am', 'are',
...     'are', 'are',
...     'is', 'are',
...
...     'was', 'were',
...     'were', 'were',
...     'was', 'were']

Create the paradigm as ordered mapping (collections.OrderedDict) from meaning to form.

>>> from collections import OrderedDict

>>> paradigm = OrderedDict(zip(cellmeanings, cellforms))

Pretty-print the meaning -> word form mapping.

>>> for meaning, form in paradigm.items():
...     print('%s | %s' % (meaning.string_extent, form))
1sg.pres | am
1pl.pres | are
2sg.pres | are
2pl.pres | are
3sg.pres | is
3pl.pres | are
1sg.past | was
1pl.past | were
2sg.past | were
2pl.past | were
3sg.past | was
3pl.past | were

Create a correspondence from each word form to the list of cell meanings where it occurs.

>>> occurrences = OrderedDict()

>>> for meaning in paradigm:
...     form = paradigm[meaning]
...     occurrences.setdefault(form, []).append(meaning)

Pretty-print the form -> occurrences mapping.

>>> for form in occurrences:
...     meanings = occurrences[form]
...     labels = ', '.join(m.string_extent for m in meanings)
...     print('%4s | %s' % (form, labels))
  am | 1sg.pres
 are | 1pl.pres, 2sg.pres, 2pl.pres, 3pl.pres
  is | 3sg.pres
 was | 1sg.past, 3sg.past
were | 1pl.past, 2sg.past, 2pl.past, 3pl.past

Show the common features for all word forms, computed with the join()-method (generalization, least upper bound).

>>> for form in occurrences:
...     meanings = occurrences[form]
...     common = fs.join(meanings)
...     print('%4s | %s' % (form, common))
  am | [+1 +sg -pst]
 are | [-pst]
  is | [+3 +sg -pst]
 was | [-2 +sg +pst]
were | [+pst]

Their necessary conditions (common features).

API Reference

API Reference

features.add_config(filename)

Add feature system definition file on top of the stack of config files.

Parameters:filename – Path to the INI-file with feature system definitions.

Note

If filename is a relative path, it is resolved relative to the directory of the caller (which may be different from the current working dicrectry).

features.make_features(context, frmat='table', str_maximal=False)

Return a new feature system from context string in the given format.

Parameters:
  • context – Formal context table as plain-text string.
  • frmat – Format of the context string (‘table’, ‘cxt’, ‘csv’).
  • str_maximal (bool) –

Example

>>> make_features('''
...      |+male|-male|+adult|-adult|
... man  |  X  |     |   X  |      |
... woman|     |  X  |   X  |      |
... boy  |  X  |     |      |   X  |
... girl |     |  X  |      |   X  |
... ''')  
<FeatureSystem object of 4 atoms 10 featuresets at 0x...>

FeatureSystem

class features.FeatureSystem(config)

Feature set lattice defined by config instance.

__call__(string='')

Idempotently return featureset from parsed feature string.

atoms

Minimal non-infimum feature sets.

downset_union(featuresets)

Yield all featuresets that imply any of the given ones.

graphviz(highlight=None, maximal_label=None, topdown=None, filename=None, directory=None, render=False, view=False)

Return the system lattice visualization as graphviz source.

join(featuresets)

Return the nearest featureset that subsumes all given ones.

meet(featuresets)

Return the nearest featureset that implies all given ones.

upset_union(featuresets)

Yield all featuresets that subsume any of the given ones.

FeatureSet

class features.bases.FeatureSet(concept)

Formal concept intent as ordered set of features.

atoms

Subsumed atoms.

complement_of(other)

Empty common extent and universal extent union.

downset()

Subsumed neighbors.

implies(other)

Implication.

incompatible_with(other)

Empty common extent.

intersection(other)

Closest implied neighbor (generalization, join).

lower_neighbors

Immediate subsumed neighbors.

orthogonal_to(other)

Nonempty common extent, incomparable, nonempty extent union.

properly_implies(other)

Proper implication.

properly_subsumes(other)

Proper subsumption.

subcontrary_with(other)

Nonempty common extent and universal extent union.

subsumes(other)

Submsumption.

union(other)

Closest subsumed neighbor (unification, meet).

upper_neighbors

Immediate implied neighbors.

upset()

Implied neighbors.

Config

class features.Config(key, context, format='table', aliases=None, inherits=None, str_maximal=False, description=None)

Define possible feature combinations and their minimal specification.

Project Info

Changelog

Version 0.5.6

Port tests from nose/unittest to pytest, add Travis CI and coveralls.

Update meta data, tag Python 3.6 support.

Version 0.5.5

Simplified feature set string parsing.

Relaxed fileconfig, concepts, and graphviz dependencies to < 1.0.

Improved documentation.

Version 0.5.4

Added extended Sphinx-based documentation.

Fixed Python 3.5 compatibility.

Version 0.5.3

Fixed broken manual install due to setuptools automatic zip_safe analysis not working as expected.

Version 0.5.2

Added string_extent attribute to feature sets.

Added simple example to README.

Moved feature name sanity checks to parser.

Version 0.5.1

Added wheel.

Version 0.5

Added Python 3.3+ support.

Version 0.4.2

Switch setup.py dependencies to version ranges.

Version 0.4.1

Easier customization.

Improved documentation.

Version 0.4

Added add_config.

Added make_features.

Version 0.3

Added orthogonal_to.

Rename unifcation to union.

Improved doctests.

Version 0.2

Update concepts dependency to 0.5 and improve separation of concerns.

Changed upset and downset from properties to methods (backwards incompatible).

Order downsets longlex instead of shortlex.

Version 0.1.3

Update concepts dependency to 0.4.

Version 0.1.2

Fixed ineffective filename parameter in visualization.

Version 0.1.1

Fixed missing config.ini in package with non-source installation.

Version 0.1

First public release.

License

The MIT License (MIT)

Copyright (c) 2014-2016 Sebastian Bank

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.