Total Correlation

The total correlation [8], denoted \(\T\), also known as the multi-information or integration, is one generalization of the mutual information. It is defined as the amount of information each individual variable carries above and beyond the joint entropy:

\[\begin{split}\T[X_{0:n}] &= \sum \H[X_i] - \H[X_{0:n}] \\ &= \sum_{x_{0:n} \in X_{0:n}} p(x_{0:n}) \log_2 \frac{p(x_{0:n})}{\prod p(x_i)}\end{split}\]

Two nice features of the total correlation are that it is non-negative and that it is zero if and only if the random variables \(X_{0:n}\) are all independent. Some baseline behavior is good to note also. First its behavior when applied to “giant bit” distributions:

>>> from dit import Distribution as D
>>> from dit.algorithms import total_correlation as T
>>> [ T(D(['0'*n, '1'*n], [0.5, 0.5])) for n in range(2, 6) ]
[1.0, 2.0, 3.0, 4.0]

So we see that for giant bit distributions, the total correlation is equal to one less than the number of variables. The second type of distribution to consider is general parity distributions:

>>> from dit.example_dists import n_mod_m
>>> [ T(n_mod_m(n, 2)) for n in range(3, 6) ]
[1.0, 1.0, 1.0]
>>> [ T(n_mod_m(3, m)) for m in range(2, 5) ]
[1.0, 1.58496250072, 2.0]

Here we see that the total correlation is equal to \(\log_2{m}\) regardless of \(n\).

The total correlation :math:`\T[X:Y]` The total correlation :math:`\T[X:Y:Z]`
total_correlation(dist, rvs=None, crvs=None, rv_names=None)[source]
Parameters :
  • dist (Distribution) – The distribution from which the total correlation is calculated.
  • rvs (list, None) – The indexes of the random variable used to calculate the total correlation. If None, then the total correlation is calculated over all random variables.
  • crvs (list, None) – The indexes of the random variables to condition on. If None, then no variables are condition on.
  • rv_names (bool, None) – If True, then the elements of rvs are treated as random variable names. If False, then the elements of rvs are treated as random variable indexes. If None, then the value True is used if the distribution has specified names for its random variables.
Returns:

T (float) – The total correlation

Raises :

ditException – Raised if dist is not a joint distribution.

Read the Docs v: dev
Versions
latest
dev
Downloads
PDF
HTML
Epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.