Complexity Measures Explained¶
Understanding what Cμ, hμ, E, and χ actually mean—with intuition and examples.
Prerequisites: Basic probability, logarithms, entropy concept helpful but not required
Reading time: 30-40 minutes
Table of Contents¶
- Why Measure Complexity?
- Entropy: The Foundation
- Statistical Complexity (Cμ)
- Entropy Rate (hμ)
- Excess Entropy (E)
- Crypticity (χ)
- The Complexity-Entropy Plane
- Worked Examples
- Practical Usage in emic
- Common Questions
1. Why Measure Complexity?¶
1.1 Not All Sequences Are Equal¶
Consider three binary sequences:
Sequence A (Random coin flips):
Sequence B (Alternating):
Sequence C (Golden Mean—no consecutive 1s):
Which is most "complex"? The answer depends on what we mean by complexity:
| Sequence | Predictable? | Structured? | Interesting? |
|---|---|---|---|
| A (Random) | ❌ No | ❌ No | ❌ Boring |
| B (Periodic) | ✅ Yes | ✅ Yes | ❌ Trivial |
| C (Golden Mean) | 🔶 Partially | ✅ Yes | ✅ Interesting |
1.2 The Insight from Computational Mechanics¶
Computational mechanics provides measures that capture different aspects:
- Cμ (Statistical Complexity): How much memory is needed to predict optimally?
- hμ (Entropy Rate): How much is fundamentally unpredictable?
- E (Excess Entropy): How much predictable structure exists?
- χ (Crypticity): How much structure is hidden from direct observation?
These four numbers tell us almost everything about a process's complexity.
2. Entropy: The Foundation¶
2.1 What is Entropy?¶
Entropy measures uncertainty or surprise.
For a random variable \(X\) with probabilities \(p_1, p_2, \ldots, p_n\):
Units: Bits (when using log base 2)
2.2 Intuition¶
| Scenario | Entropy | Intuition |
|---|---|---|
| Fair coin (50/50) | 1.0 bits | Maximum uncertainty for 2 outcomes |
| Biased coin (90/10) | 0.47 bits | Less uncertain—usually see heads |
| Certain outcome (100/0) | 0.0 bits | No uncertainty at all |
2.3 Examples¶
Fair coin: \(H = -0.5 \log_2(0.5) - 0.5 \log_2(0.5) = 1.0\) bits
Biased coin (p=0.9): $\(H = -0.9 \log_2(0.9) - 0.1 \log_2(0.1) = 0.47 \text{ bits}\)$
Fair die (6 sides): $\(H = -6 \times \frac{1}{6} \log_2\left(\frac{1}{6}\right) = \log_2(6) = 2.58 \text{ bits}\)$
2.4 Key Properties¶
- \(H(X) \geq 0\) (entropy is non-negative)
- \(H(X) \leq \log_2(n)\) (maximum when uniform)
- \(H(X) = 0\) only when one outcome is certain
3. Statistical Complexity (Cμ)¶
3.1 The Definition¶
Statistical complexity is the entropy of the causal state distribution:
where \(\pi_s\) is the probability of being in state \(s\) at any given time (the stationary distribution).
3.2 What It Measures¶
Cμ answers: How much information about the past must we remember to predict optimally?
| Cμ Value | Interpretation |
|---|---|
| 0 bits | No memory needed (i.i.d. process) |
| 1 bit | 2 equally likely states |
| 2 bits | 4 equally likely states |
| High | Many states or unequal probabilities |
3.3 Intuition: The Memory Cost¶
Imagine you're a prediction machine: - You watch symbols stream by - You want to predict what comes next - You can store some information about the past
Cμ tells you the minimum storage (in bits) you need for optimal prediction.
3.4 Examples¶
Biased Coin (i.i.d.): - 1 state → \(C_\mu = 0\) bits - No memory helps—each symbol is independent
Golden Mean (2 states, roughly 2/3 and 1/3): - \(C_\mu = -\frac{2}{3}\log_2\frac{2}{3} - \frac{1}{3}\log_2\frac{1}{3} \approx 0.918\) bits - Must remember if last symbol was 0 or 1
Alternating (0101...) (2 states, each 1/2): - \(C_\mu = 1.0\) bit exactly - Must remember parity (even or odd position)
4. Entropy Rate (hμ)¶
4.1 The Definition¶
Entropy rate is the average uncertainty per symbol, given optimal prediction:
Or equivalently:
4.2 What It Measures¶
hμ answers: How much randomness is intrinsic to the process?
| hμ Value | Interpretation |
|---|---|
| 0 bits/symbol | Completely deterministic |
| 1 bit/symbol | Maximum randomness (binary) |
| Between | Partially predictable |
4.3 Intuition: Irreducible Randomness¶
Even with perfect knowledge of the current state, you still can't predict perfectly. hμ measures this irreducible uncertainty.
- Low hμ → Process is mostly predictable
- High hμ → Process is mostly random
4.4 Examples¶
Periodic (0101...): - \(h_\mu = 0\) bits/symbol - Given your position, you know exactly what comes next
Fair Coin: - \(h_\mu = 1.0\) bit/symbol - Each symbol is completely unpredictable
Golden Mean (p=0.5): - From state A: 50% chance of 0, 50% chance of 1 → 1 bit uncertainty - From state B: 100% chance of 0 → 0 bits uncertainty - Weighted average: \(h_\mu = \frac{2}{3}(1.0) + \frac{1}{3}(0) = 0.667\) bits/symbol
4.5 Relationship to Entropy¶
The entropy rate is always less than or equal to the per-symbol entropy:
The difference comes from predictability—correlations between symbols.
5. Excess Entropy (E)¶
5.1 The Definition¶
Excess entropy is the mutual information between past and future:
where \(\overleftarrow{X}\) is the entire past and \(\overrightarrow{X}\) is the entire future.
5.2 What It Measures¶
E answers: How much information does the past provide about the future?
| E Value | Interpretation |
|---|---|
| 0 bits | Past tells nothing about future (i.i.d.) |
| High | Strong past-future correlation |
5.3 Intuition: Total Predictable Structure¶
Excess entropy measures the total amount of predictable structure in the process.
- E = 0: No correlations (random)
- E = ∞: Infinitely long correlations (possible in some processes)
5.4 Computation¶
For a finite-state process:
where χ is crypticity (see next section).
Or directly:
5.5 Examples¶
Fair Coin: \(E = 0\) (no correlations)
Golden Mean: \(E \approx 0.459\) bits (knowing we saw 1 means next must be 0)
Periodic: \(E = \log_2(\text{period})\) (correlations extend to period length)
6. Crypticity (χ)¶
6.1 The Definition¶
Crypticity is the difference between statistical complexity and excess entropy:
6.2 What It Measures¶
χ answers: How much of the stored information is "hidden" from direct observation?
6.3 Intuition: Hidden Information¶
Some processes store information that affects their behavior but isn't directly visible in the output. This "cryptic" information is captured by χ.
Example: Imagine a machine that remembers a secret and only reveals it through subtle correlations. The memory (Cμ) is high, but the visible structure (E) is low. The difference (χ) represents the hidden part.
6.4 Properties¶
- \(\chi \geq 0\) always
- \(\chi = 0\) means all stored information is directly observable
- \(\chi > 0\) means some information is "encrypted" in the state
6.5 Examples¶
Golden Mean: - \(C_\mu = 0.918\), \(E = 0.459\) - \(\chi = 0.918 - 0.459 = 0.459\) bits
This means about half the stored information is "cryptic"—it affects future behavior but isn't immediately obvious from short observations.
7. The Complexity-Entropy Plane¶
7.1 Visualizing Process Types¶
The Cμ-hμ plane reveals different regimes of behavior:
Cμ (Statistical Complexity)
↑
High │ ● Complex ● Chaotic
│ processes edge-of-chaos
│
Med │ ● Structured ● Random with
│ stochastic structure
│
Low │ ● Simple ● Pure
│ periodic random
└─────────────────────────────→ hμ
Low High (Entropy Rate)
7.2 Process Archetypes¶
| Location | Example | Cμ | hμ |
|---|---|---|---|
| Bottom-left | Constant (000...) | 0 | 0 |
| Bottom-right | Fair coin | 0 | 1 |
| Top-left | Period-1000 | ~10 | 0 |
| Middle | Golden Mean | ~0.9 | ~0.67 |
7.3 The "Interesting" Region¶
The most interesting processes live in the middle—neither purely random nor purely deterministic. They have: - Moderate Cμ: Some structure to remember - Moderate hμ: Some genuine unpredictability - High E: Rich correlations between past and future
Examples: natural language, biological sequences, financial time series.
8. Worked Examples¶
8.1 The Biased Coin¶
A coin that shows heads with probability 0.7:
from emic.sources import BiasedCoinSource, TakeN
from emic.inference import CSSR, CSSRConfig
from emic.analysis import analyze
source = BiasedCoinSource(p=0.7, _seed=42)
data = TakeN(10_000)(source)
result = CSSR(CSSRConfig(max_history=3)).infer(data)
summary = analyze(result.machine)
print(f"States: {summary.num_states}") # 1
print(f"Cμ = {summary.statistical_complexity}") # 0.0
print(f"hμ = {summary.entropy_rate}") # ~0.88
print(f"E = {summary.excess_entropy}") # 0.0
Interpretation: - 1 state: No memory helps - Cμ = 0: Nothing to remember - hμ = 0.88: Close to maximum entropy (0.88 for p=0.7) - E = 0: No correlations
8.2 The Periodic Process¶
A process that repeats "01":
from emic.sources import PeriodicSource, TakeN
source = PeriodicSource(pattern=(0, 1))
data = TakeN(1_000)(source)
result = CSSR(CSSRConfig(max_history=3)).infer(data)
summary = analyze(result.machine)
print(f"States: {summary.num_states}") # 2
print(f"Cμ = {summary.statistical_complexity}") # 1.0
print(f"hμ = {summary.entropy_rate}") # 0.0
print(f"E = {summary.excess_entropy}") # 1.0
Interpretation: - 2 states: "Just saw 0" and "Just saw 1" - Cμ = 1.0: Need 1 bit to store which state - hμ = 0: Perfectly predictable - E = 1.0: Strong past-future correlation
8.3 The Golden Mean Process¶
No consecutive 1s allowed:
from emic.sources import GoldenMeanSource, TakeN
source = GoldenMeanSource(p=0.5, _seed=42)
data = TakeN(10_000)(source)
result = CSSR(CSSRConfig(max_history=5)).infer(data)
summary = analyze(result.machine)
print(f"States: {summary.num_states}") # 2
print(f"Cμ = {summary.statistical_complexity:.3f}") # ~0.918
print(f"hμ = {summary.entropy_rate:.3f}") # ~0.667
print(f"E = {summary.excess_entropy:.3f}") # ~0.459
print(f"χ = {summary.crypticity:.3f}") # ~0.459
Interpretation: - 2 states: "Can emit 0 or 1" vs "Must emit 0" - Cμ ≈ 0.918: Less than 1 bit (unequal state probabilities) - hμ ≈ 0.667: Moderate randomness - E ≈ 0.459: Moderate structure - χ ≈ 0.459: Half of Cμ is "cryptic"
8.4 The Even Process¶
1s must come in pairs:
from emic.sources import EvenProcessSource, TakeN
source = EvenProcessSource(p=0.5, _seed=42)
data = TakeN(10_000)(source)
result = CSSR(CSSRConfig(max_history=5, post_merge=True)).infer(data)
summary = analyze(result.machine)
print(f"States: {summary.num_states}") # 2
print(f"Cμ = {summary.statistical_complexity:.3f}") # ~1.0
print(f"hμ = {summary.entropy_rate:.3f}") # ~0.5
print(f"E = {summary.excess_entropy:.3f}") # ~0.5
Interpretation: - 2 states: "Neutral" vs "Must emit another 1" - Higher structure, lower randomness than Golden Mean
9. Practical Usage in emic¶
9.1 The analyze() Function¶
from emic.analysis import analyze
summary = analyze(machine)
# Access all measures
summary.statistical_complexity # Cμ in bits
summary.entropy_rate # hμ in bits/symbol
summary.excess_entropy # E in bits
summary.crypticity # χ in bits
# Structural info
summary.num_states # Number of causal states
summary.num_transitions # Number of transitions
summary.topological_complexity # log₂(num_states)
9.2 Individual Functions¶
from emic.analysis import (
statistical_complexity,
entropy_rate,
excess_entropy,
)
c_mu = statistical_complexity(machine)
h_mu = entropy_rate(machine)
e = excess_entropy(machine)
9.3 Comparing Inferred vs True¶
from emic.sources import GoldenMeanSource, TakeN
from emic.inference import Spectral, SpectralConfig
from emic.analysis import analyze
source = GoldenMeanSource(p=0.5, _seed=42)
# Theoretical machine
true_summary = analyze(source.true_machine)
print(f"True Cμ = {true_summary.statistical_complexity:.4f}")
# Inferred machine
data = TakeN(10_000)(source)
result = Spectral(SpectralConfig()).infer(data)
inferred_summary = analyze(result.machine)
print(f"Inferred Cμ = {inferred_summary.statistical_complexity:.4f}")
# Error
error = abs(true_summary.statistical_complexity - inferred_summary.statistical_complexity)
print(f"Error = {error:.4f}")
10. Common Questions¶
10.1 Why is Cμ sometimes called "statistical complexity"?¶
The word "statistical" distinguishes it from other complexity measures (like Kolmogorov complexity). It's computed from the stationary probability distribution, hence "statistical."
10.2 What if Cμ = 0?¶
This means the process is i.i.d. (independent and identically distributed). No memory helps prediction—each symbol is independent of the past.
10.3 What if hμ = 0?¶
This means the process is deterministic. Given the current state, the next symbol is completely predictable.
10.4 Can E > Cμ?¶
No! We always have:
This is because the past-future information (E) can't exceed the information stored in the causal state (Cμ).
10.5 What's the relationship between these measures?¶
Key equations: - \(\chi = C_\mu - E\) (definition of crypticity) - \(E \leq C_\mu\) (structure can't exceed storage) - \(h_\mu \leq H(X)\) (entropy rate ≤ marginal entropy)
10.6 Which measure should I report?¶
For most purposes, report Cμ and hμ together. They give the complete picture: - Cμ: The memory/structure side - hμ: The randomness side
Add E if you're studying correlations, and χ if you're interested in "hidden" structure.
10.7 Are these measures robust to noise?¶
Moderate noise (< 10%) typically: - Increases Cμ slightly (noise adds spurious states) - Increases hμ (more apparent randomness) - Decreases E (correlations are obscured)
Use BitFlipNoise to study robustness:
from emic.sources import GoldenMeanSource, TakeN, BitFlipNoise
source = GoldenMeanSource(p=0.5, _seed=42)
noisy = source >> BitFlipNoise(flip_prob=0.05, seed=123) >> TakeN(10_000)
Summary¶
| Measure | Symbol | Question Answered | Range |
|---|---|---|---|
| Statistical Complexity | Cμ | How much memory for optimal prediction? | 0 to ∞ |
| Entropy Rate | hμ | How much is irreducibly random? | 0 to log₂(|A|) |
| Excess Entropy | E | How much past-future correlation? | 0 to ∞ |
| Crypticity | χ | How much structure is hidden? | 0 to Cμ |
Key relationships: - Random processes: Cμ = 0, hμ = max - Deterministic processes: hμ = 0 - "Interesting" processes: Moderate Cμ and hμ - Always: E ≤ Cμ, χ = Cμ - E
These measures, developed through computational mechanics, provide a principled vocabulary for describing the complexity of stochastic processes.