Skip to content

Complexity Measures Explained

Understanding what Cμ, hμ, E, and χ actually mean—with intuition and examples.

Prerequisites: Basic probability, logarithms, entropy concept helpful but not required

Reading time: 30-40 minutes


Table of Contents

  1. Why Measure Complexity?
  2. Entropy: The Foundation
  3. Statistical Complexity (Cμ)
  4. Entropy Rate (hμ)
  5. Excess Entropy (E)
  6. Crypticity (χ)
  7. The Complexity-Entropy Plane
  8. Worked Examples
  9. Practical Usage in emic
  10. Common Questions

1. Why Measure Complexity?

1.1 Not All Sequences Are Equal

Consider three binary sequences:

Sequence A (Random coin flips):

1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 0 ...

Sequence B (Alternating):

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ...

Sequence C (Golden Mean—no consecutive 1s):

0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 ...

Which is most "complex"? The answer depends on what we mean by complexity:

Sequence Predictable? Structured? Interesting?
A (Random) ❌ No ❌ No ❌ Boring
B (Periodic) ✅ Yes ✅ Yes ❌ Trivial
C (Golden Mean) 🔶 Partially ✅ Yes ✅ Interesting

1.2 The Insight from Computational Mechanics

Computational mechanics provides measures that capture different aspects:

  • Cμ (Statistical Complexity): How much memory is needed to predict optimally?
  • hμ (Entropy Rate): How much is fundamentally unpredictable?
  • E (Excess Entropy): How much predictable structure exists?
  • χ (Crypticity): How much structure is hidden from direct observation?

These four numbers tell us almost everything about a process's complexity.


2. Entropy: The Foundation

2.1 What is Entropy?

Entropy measures uncertainty or surprise.

For a random variable \(X\) with probabilities \(p_1, p_2, \ldots, p_n\):

\[H(X) = -\sum_{i=1}^n p_i \log_2 p_i\]

Units: Bits (when using log base 2)

2.2 Intuition

Scenario Entropy Intuition
Fair coin (50/50) 1.0 bits Maximum uncertainty for 2 outcomes
Biased coin (90/10) 0.47 bits Less uncertain—usually see heads
Certain outcome (100/0) 0.0 bits No uncertainty at all

2.3 Examples

Fair coin: \(H = -0.5 \log_2(0.5) - 0.5 \log_2(0.5) = 1.0\) bits

Biased coin (p=0.9): $\(H = -0.9 \log_2(0.9) - 0.1 \log_2(0.1) = 0.47 \text{ bits}\)$

Fair die (6 sides): $\(H = -6 \times \frac{1}{6} \log_2\left(\frac{1}{6}\right) = \log_2(6) = 2.58 \text{ bits}\)$

2.4 Key Properties

  • \(H(X) \geq 0\) (entropy is non-negative)
  • \(H(X) \leq \log_2(n)\) (maximum when uniform)
  • \(H(X) = 0\) only when one outcome is certain

3. Statistical Complexity (Cμ)

3.1 The Definition

Statistical complexity is the entropy of the causal state distribution:

\[C_\mu = H(S) = -\sum_{s \in \mathcal{S}} \pi_s \log_2 \pi_s\]

where \(\pi_s\) is the probability of being in state \(s\) at any given time (the stationary distribution).

3.2 What It Measures

Cμ answers: How much information about the past must we remember to predict optimally?

Cμ Value Interpretation
0 bits No memory needed (i.i.d. process)
1 bit 2 equally likely states
2 bits 4 equally likely states
High Many states or unequal probabilities

3.3 Intuition: The Memory Cost

Imagine you're a prediction machine: - You watch symbols stream by - You want to predict what comes next - You can store some information about the past

Cμ tells you the minimum storage (in bits) you need for optimal prediction.

3.4 Examples

Biased Coin (i.i.d.): - 1 state → \(C_\mu = 0\) bits - No memory helps—each symbol is independent

Golden Mean (2 states, roughly 2/3 and 1/3): - \(C_\mu = -\frac{2}{3}\log_2\frac{2}{3} - \frac{1}{3}\log_2\frac{1}{3} \approx 0.918\) bits - Must remember if last symbol was 0 or 1

Alternating (0101...) (2 states, each 1/2): - \(C_\mu = 1.0\) bit exactly - Must remember parity (even or odd position)


4. Entropy Rate (hμ)

4.1 The Definition

Entropy rate is the average uncertainty per symbol, given optimal prediction:

\[h_\mu = \lim_{n \to \infty} \frac{H(X_1, X_2, \ldots, X_n)}{n}\]

Or equivalently:

\[h_\mu = H(X_{t+1} | S_t) = -\sum_{s \in \mathcal{S}} \pi_s \sum_{x \in \mathcal{A}} P(x|s) \log_2 P(x|s)\]

4.2 What It Measures

hμ answers: How much randomness is intrinsic to the process?

hμ Value Interpretation
0 bits/symbol Completely deterministic
1 bit/symbol Maximum randomness (binary)
Between Partially predictable

4.3 Intuition: Irreducible Randomness

Even with perfect knowledge of the current state, you still can't predict perfectly. hμ measures this irreducible uncertainty.

  • Low hμ → Process is mostly predictable
  • High hμ → Process is mostly random

4.4 Examples

Periodic (0101...): - \(h_\mu = 0\) bits/symbol - Given your position, you know exactly what comes next

Fair Coin: - \(h_\mu = 1.0\) bit/symbol - Each symbol is completely unpredictable

Golden Mean (p=0.5): - From state A: 50% chance of 0, 50% chance of 1 → 1 bit uncertainty - From state B: 100% chance of 0 → 0 bits uncertainty - Weighted average: \(h_\mu = \frac{2}{3}(1.0) + \frac{1}{3}(0) = 0.667\) bits/symbol

4.5 Relationship to Entropy

The entropy rate is always less than or equal to the per-symbol entropy:

\[h_\mu \leq H(X)\]

The difference comes from predictability—correlations between symbols.


5. Excess Entropy (E)

5.1 The Definition

Excess entropy is the mutual information between past and future:

\[E = I(\overleftarrow{X}; \overrightarrow{X})\]

where \(\overleftarrow{X}\) is the entire past and \(\overrightarrow{X}\) is the entire future.

5.2 What It Measures

E answers: How much information does the past provide about the future?

E Value Interpretation
0 bits Past tells nothing about future (i.i.d.)
High Strong past-future correlation

5.3 Intuition: Total Predictable Structure

Excess entropy measures the total amount of predictable structure in the process.

  • E = 0: No correlations (random)
  • E = ∞: Infinitely long correlations (possible in some processes)

5.4 Computation

For a finite-state process:

\[E = C_\mu - \chi\]

where χ is crypticity (see next section).

Or directly:

\[E = \lim_{L \to \infty} \left[ H(X_1, \ldots, X_L) - L \cdot h_\mu \right]\]

5.5 Examples

Fair Coin: \(E = 0\) (no correlations)

Golden Mean: \(E \approx 0.459\) bits (knowing we saw 1 means next must be 0)

Periodic: \(E = \log_2(\text{period})\) (correlations extend to period length)


6. Crypticity (χ)

6.1 The Definition

Crypticity is the difference between statistical complexity and excess entropy:

\[\chi = C_\mu - E\]

6.2 What It Measures

χ answers: How much of the stored information is "hidden" from direct observation?

6.3 Intuition: Hidden Information

Some processes store information that affects their behavior but isn't directly visible in the output. This "cryptic" information is captured by χ.

Example: Imagine a machine that remembers a secret and only reveals it through subtle correlations. The memory (Cμ) is high, but the visible structure (E) is low. The difference (χ) represents the hidden part.

6.4 Properties

  • \(\chi \geq 0\) always
  • \(\chi = 0\) means all stored information is directly observable
  • \(\chi > 0\) means some information is "encrypted" in the state

6.5 Examples

Golden Mean: - \(C_\mu = 0.918\), \(E = 0.459\) - \(\chi = 0.918 - 0.459 = 0.459\) bits

This means about half the stored information is "cryptic"—it affects future behavior but isn't immediately obvious from short observations.


7. The Complexity-Entropy Plane

7.1 Visualizing Process Types

The Cμ-hμ plane reveals different regimes of behavior:

       Cμ (Statistical Complexity)
   High │    ●  Complex            ●  Chaotic
        │       processes             edge-of-chaos
   Med  │    ●  Structured         ●  Random with
        │       stochastic            structure
   Low  │    ●  Simple             ●  Pure
        │       periodic              random
        └─────────────────────────────→ hμ
             Low              High     (Entropy Rate)

7.2 Process Archetypes

Location Example
Bottom-left Constant (000...) 0 0
Bottom-right Fair coin 0 1
Top-left Period-1000 ~10 0
Middle Golden Mean ~0.9 ~0.67

7.3 The "Interesting" Region

The most interesting processes live in the middle—neither purely random nor purely deterministic. They have: - Moderate Cμ: Some structure to remember - Moderate hμ: Some genuine unpredictability - High E: Rich correlations between past and future

Examples: natural language, biological sequences, financial time series.


8. Worked Examples

8.1 The Biased Coin

A coin that shows heads with probability 0.7:

from emic.sources import BiasedCoinSource, TakeN
from emic.inference import CSSR, CSSRConfig
from emic.analysis import analyze

source = BiasedCoinSource(p=0.7, _seed=42)
data = TakeN(10_000)(source)

result = CSSR(CSSRConfig(max_history=3)).infer(data)
summary = analyze(result.machine)

print(f"States: {summary.num_states}")           # 1
print(f"Cμ = {summary.statistical_complexity}")  # 0.0
print(f"hμ = {summary.entropy_rate}")            # ~0.88
print(f"E  = {summary.excess_entropy}")          # 0.0

Interpretation: - 1 state: No memory helps - Cμ = 0: Nothing to remember - hμ = 0.88: Close to maximum entropy (0.88 for p=0.7) - E = 0: No correlations

8.2 The Periodic Process

A process that repeats "01":

from emic.sources import PeriodicSource, TakeN

source = PeriodicSource(pattern=(0, 1))
data = TakeN(1_000)(source)

result = CSSR(CSSRConfig(max_history=3)).infer(data)
summary = analyze(result.machine)

print(f"States: {summary.num_states}")           # 2
print(f"Cμ = {summary.statistical_complexity}")  # 1.0
print(f"hμ = {summary.entropy_rate}")            # 0.0
print(f"E  = {summary.excess_entropy}")          # 1.0

Interpretation: - 2 states: "Just saw 0" and "Just saw 1" - Cμ = 1.0: Need 1 bit to store which state - hμ = 0: Perfectly predictable - E = 1.0: Strong past-future correlation

8.3 The Golden Mean Process

No consecutive 1s allowed:

from emic.sources import GoldenMeanSource, TakeN

source = GoldenMeanSource(p=0.5, _seed=42)
data = TakeN(10_000)(source)

result = CSSR(CSSRConfig(max_history=5)).infer(data)
summary = analyze(result.machine)

print(f"States: {summary.num_states}")           # 2
print(f"Cμ = {summary.statistical_complexity:.3f}")  # ~0.918
print(f"hμ = {summary.entropy_rate:.3f}")            # ~0.667
print(f"E  = {summary.excess_entropy:.3f}")          # ~0.459
print(f"χ  = {summary.crypticity:.3f}")              # ~0.459

Interpretation: - 2 states: "Can emit 0 or 1" vs "Must emit 0" - Cμ ≈ 0.918: Less than 1 bit (unequal state probabilities) - hμ ≈ 0.667: Moderate randomness - E ≈ 0.459: Moderate structure - χ ≈ 0.459: Half of Cμ is "cryptic"

8.4 The Even Process

1s must come in pairs:

from emic.sources import EvenProcessSource, TakeN

source = EvenProcessSource(p=0.5, _seed=42)
data = TakeN(10_000)(source)

result = CSSR(CSSRConfig(max_history=5, post_merge=True)).infer(data)
summary = analyze(result.machine)

print(f"States: {summary.num_states}")           # 2
print(f"Cμ = {summary.statistical_complexity:.3f}")  # ~1.0
print(f"hμ = {summary.entropy_rate:.3f}")            # ~0.5
print(f"E  = {summary.excess_entropy:.3f}")          # ~0.5

Interpretation: - 2 states: "Neutral" vs "Must emit another 1" - Higher structure, lower randomness than Golden Mean


9. Practical Usage in emic

9.1 The analyze() Function

from emic.analysis import analyze

summary = analyze(machine)

# Access all measures
summary.statistical_complexity  # Cμ in bits
summary.entropy_rate           # hμ in bits/symbol
summary.excess_entropy         # E in bits
summary.crypticity             # χ in bits

# Structural info
summary.num_states             # Number of causal states
summary.num_transitions        # Number of transitions
summary.topological_complexity # log₂(num_states)

9.2 Individual Functions

from emic.analysis import (
    statistical_complexity,
    entropy_rate,
    excess_entropy,
)

c_mu = statistical_complexity(machine)
h_mu = entropy_rate(machine)
e = excess_entropy(machine)

9.3 Comparing Inferred vs True

from emic.sources import GoldenMeanSource, TakeN
from emic.inference import Spectral, SpectralConfig
from emic.analysis import analyze

source = GoldenMeanSource(p=0.5, _seed=42)

# Theoretical machine
true_summary = analyze(source.true_machine)
print(f"True Cμ = {true_summary.statistical_complexity:.4f}")

# Inferred machine
data = TakeN(10_000)(source)
result = Spectral(SpectralConfig()).infer(data)
inferred_summary = analyze(result.machine)
print(f"Inferred Cμ = {inferred_summary.statistical_complexity:.4f}")

# Error
error = abs(true_summary.statistical_complexity - inferred_summary.statistical_complexity)
print(f"Error = {error:.4f}")

10. Common Questions

10.1 Why is Cμ sometimes called "statistical complexity"?

The word "statistical" distinguishes it from other complexity measures (like Kolmogorov complexity). It's computed from the stationary probability distribution, hence "statistical."

10.2 What if Cμ = 0?

This means the process is i.i.d. (independent and identically distributed). No memory helps prediction—each symbol is independent of the past.

10.3 What if hμ = 0?

This means the process is deterministic. Given the current state, the next symbol is completely predictable.

10.4 Can E > Cμ?

No! We always have:

\[E \leq C_\mu\]

This is because the past-future information (E) can't exceed the information stored in the causal state (Cμ).

10.5 What's the relationship between these measures?

Key equations: - \(\chi = C_\mu - E\) (definition of crypticity) - \(E \leq C_\mu\) (structure can't exceed storage) - \(h_\mu \leq H(X)\) (entropy rate ≤ marginal entropy)

10.6 Which measure should I report?

For most purposes, report Cμ and hμ together. They give the complete picture: - Cμ: The memory/structure side - hμ: The randomness side

Add E if you're studying correlations, and χ if you're interested in "hidden" structure.

10.7 Are these measures robust to noise?

Moderate noise (< 10%) typically: - Increases Cμ slightly (noise adds spurious states) - Increases hμ (more apparent randomness) - Decreases E (correlations are obscured)

Use BitFlipNoise to study robustness:

from emic.sources import GoldenMeanSource, TakeN, BitFlipNoise

source = GoldenMeanSource(p=0.5, _seed=42)
noisy = source >> BitFlipNoise(flip_prob=0.05, seed=123) >> TakeN(10_000)

Summary

Measure Symbol Question Answered Range
Statistical Complexity How much memory for optimal prediction? 0 to ∞
Entropy Rate How much is irreducibly random? 0 to log₂(|A|)
Excess Entropy E How much past-future correlation? 0 to ∞
Crypticity χ How much structure is hidden? 0 to Cμ

Key relationships: - Random processes: Cμ = 0, hμ = max - Deterministic processes: hμ = 0 - "Interesting" processes: Moderate Cμ and hμ - Always: E ≤ Cμ, χ = Cμ - E

These measures, developed through computational mechanics, provide a principled vocabulary for describing the complexity of stochastic processes.