emic.experiments¶
The experiments module provides a framework for reproducible algorithm benchmarking.
Command-Line Interface¶
Options¶
| Option | Description |
|---|---|
--all |
Run all experiments |
--quick |
Quick mode (reduced params, skip slow algorithms) |
--parallel N |
Run with N parallel workers |
--shard M/N |
Run shard M of N (for distributed execution) |
--combine DIR |
Combine sharded results from DIR |
--list |
List available experiments |
--algorithms |
Comma-separated list of algorithms (e.g., --algorithms cssr,spectral) |
--timeout |
Per-run timeout in seconds (default: 120) |
-o, --output-dir |
Output directory (default: experiments/runs) |
-q, --quiet |
Suppress progress output |
Core Classes¶
ExperimentRunner
¶
ExperimentRunner(
config: ExperimentsConfig | None = None,
process_registry: ProcessRegistry | None = None,
algorithm_registry: AlgorithmRegistry | None = None,
output_dir: str | None = None,
verbose: bool = True,
shard: tuple[int, int] | None = None,
algorithms_filter: list[str] | None = None,
)
Run benchmark experiments and collect results.
Example
runner = ExperimentRunner() runner.run_all() # Run all default experiments runner.run_experiment("accuracy") # Run specific experiment
Initialize the benchmark runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ExperimentsConfig | None
|
Benchmark configuration (uses defaults if None) |
None
|
process_registry
|
ProcessRegistry | None
|
Process registry (uses defaults if None) |
None
|
algorithm_registry
|
AlgorithmRegistry | None
|
Algorithm registry (uses defaults if None) |
None
|
output_dir
|
str | None
|
Override output directory |
None
|
verbose
|
bool
|
Print progress to stdout |
True
|
shard
|
tuple[int, int] | None
|
Optional (shard_index, total_shards) for parallel execution |
None
|
algorithms_filter
|
list[str] | None
|
Optional list of algorithm names to run (overrides config) |
None
|
Source code in src/emic/experiments/runner.py
run_experiment
¶
run_experiment(
experiment: ExperimentConfig,
) -> list[BenchmarkResult]
Run a single experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment
|
ExperimentConfig
|
Experiment configuration |
required |
Returns:
| Type | Description |
|---|---|
list[BenchmarkResult]
|
List of all results from the experiment |
Source code in src/emic/experiments/runner.py
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 | |
run_all
¶
run_all() -> list[BenchmarkResult]
Run all experiments in the configuration.
Returns:
| Type | Description |
|---|---|
list[BenchmarkResult]
|
List of all results |
Source code in src/emic/experiments/runner.py
run_single_benchmark
¶
run_single_benchmark(
algorithm_info: AlgorithmInfo,
process_info: ProcessInfo,
n_samples: int,
experiment_name: str,
seed: int = 42,
timeout_seconds: int = 120,
) -> list[BenchmarkResult]
Run a single benchmark configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
algorithm_info
|
AlgorithmInfo
|
Algorithm to benchmark |
required |
process_info
|
ProcessInfo
|
Process to generate data from |
required |
n_samples
|
int
|
Number of samples to generate |
required |
experiment_name
|
str
|
Name of the parent experiment |
required |
seed
|
int
|
Random seed for reproducibility |
42
|
timeout_seconds
|
int
|
Maximum time for this run |
120
|
Returns:
| Type | Description |
|---|---|
list[BenchmarkResult]
|
List of BenchmarkResult for each metric |
Source code in src/emic/experiments/runner.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
Configuration¶
ExperimentConfig
dataclass
¶
ExperimentConfig(
name: str,
description: str = "",
algorithms: list[str] = (
lambda: ["cssr", "spectral"]
)(),
processes: list[str] = (
lambda: ["even_process", "golden_mean"]
)(),
sample_sizes: list[int] = (
lambda: [1000, 10000, 100000]
)(),
metrics: list[str] = (
lambda: ["state_count", "cmu", "hmu", "duration_s"]
)(),
repetitions: int = 1,
repetitions_by_sample_size: dict[int, int] = dict(),
seed_offset: int = 0,
algorithm_configs: dict[str, dict[str, Any]] = dict(),
timeout_seconds: int = 120,
)
Configuration for a single experiment.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Experiment identifier (e.g., "accuracy") |
description |
str
|
Human-readable description |
algorithms |
list[str]
|
List of algorithm names to benchmark |
processes |
list[str]
|
List of process names to test |
sample_sizes |
list[int]
|
List of N values for data generation |
metrics |
list[str]
|
List of metrics to compute |
repetitions |
int
|
Default number of times to repeat each configuration |
repetitions_by_sample_size |
dict[int, int]
|
Override repetitions per sample size (e.g., {1000: 5, 10000: 3}) |
seed_offset |
int
|
Base seed for random number generation |
algorithm_configs |
dict[str, dict[str, Any]]
|
Per-algorithm config overrides |
timeout_seconds |
int
|
Per-run timeout in seconds |
get_repetitions
¶
ExperimentsConfig
dataclass
¶
ExperimentsConfig(
experiments: list[ExperimentConfig],
output_dir: str = "experiments/runs",
quick_mode: bool = False,
quick_sample_sizes: list[int] = (lambda: [1000])(),
)
Top-level experiments configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
experiments |
list[ExperimentConfig]
|
List of experiment configurations |
output_dir |
str
|
Directory for results output |
quick_mode |
bool
|
If True, use reduced sample sizes and skip slow algorithms |
quick_sample_sizes |
list[int]
|
Sample sizes to use in quick mode |
from_yaml
classmethod
¶
from_yaml(path: str | Path) -> ExperimentsConfig
from_dict
classmethod
¶
from_dict(data: dict[str, Any]) -> ExperimentsConfig
Create configuration from a dictionary.
Source code in src/emic/experiments/config.py
get_experiment
¶
get_experiment(name: str) -> ExperimentConfig
Get an experiment by name.
Source code in src/emic/experiments/config.py
load_config
¶
load_config(
path: str | Path | None = None, quick_mode: bool = False
) -> ExperimentsConfig
Load benchmark configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path | None
|
Path to YAML config file. If None, uses defaults. |
None
|
quick_mode
|
bool
|
If True, use reduced parameter space. |
False
|
Returns:
| Type | Description |
|---|---|
ExperimentsConfig
|
Loaded or default configuration |
Source code in src/emic/experiments/config.py
Registries¶
ProcessRegistry
¶
Registry of benchmark processes.
Processes are data sources with known ground truth for validation.
Example
registry = ProcessRegistry() registry.register( name="even_process", display_name="Even Process", factory=EvenProcessSource, parameters={"p": 0.5}, ground_truth={"state_count": 2, "cmu": 1.0}, ) process = registry.get("even_process") source = process.create_source(seed=42)
Source code in src/emic/experiments/registry.py
register
¶
register(
name: str,
display_name: str,
factory: Callable[..., SequenceSource],
ground_truth: dict[str, float] | None = None,
description: str = "",
parameters: dict[str, Any] | None = None,
) -> None
Register a process.
Source code in src/emic/experiments/registry.py
ProcessInfo
dataclass
¶
ProcessInfo(
name: str,
display_name: str,
factory: Callable[..., SequenceSource],
ground_truth: dict[str, float] = dict(),
description: str = "",
parameters: dict[str, Any] = dict(),
)
Information about a benchmark process.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique identifier (e.g., "even_process") |
display_name |
str
|
Human-readable name (e.g., "Even Process") |
factory |
Callable[..., SequenceSource]
|
Callable that creates the source (takes seed as kwarg) |
ground_truth |
dict[str, float]
|
Dictionary of expected metric values |
description |
str
|
Optional description |
parameters |
dict[str, Any]
|
Parameters passed to the factory |
create_source
¶
create_source(seed: int = 42) -> SequenceSource
AlgorithmRegistry
¶
Registry of benchmark algorithms.
Algorithms are inference methods that reconstruct epsilon-machines.
Example
registry = AlgorithmRegistry() registry.register( name="cssr", display_name="CSSR", factory=CSSR, config_class=CSSRConfig, default_config={"max_history": 5, "significance": 0.05}, ) algo_info = registry.get("cssr") algo = algo_info.create_algorithm(max_history=8)
Source code in src/emic/experiments/registry.py
register
¶
register(
name: str,
display_name: str,
factory: Callable[..., InferenceAlgorithm],
config_class: type | None = None,
default_config: dict[str, Any] | None = None,
slow: bool = False,
description: str = "",
) -> None
Register an algorithm.
Source code in src/emic/experiments/registry.py
AlgorithmInfo
dataclass
¶
AlgorithmInfo(
name: str,
display_name: str,
factory: Callable[..., InferenceAlgorithm],
config_class: type | None = None,
default_config: dict[str, Any] = dict(),
slow: bool = False,
description: str = "",
)
Information about a benchmark algorithm.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique identifier (e.g., "cssr") |
display_name |
str
|
Human-readable name (e.g., "CSSR") |
factory |
Callable[..., InferenceAlgorithm]
|
Callable that creates the algorithm (takes config kwargs) |
config_class |
type | None
|
Configuration class for the algorithm |
default_config |
dict[str, Any]
|
Default configuration parameters |
slow |
bool
|
Whether this algorithm is slow (skipped in --quick mode) |
description |
str
|
Optional description |
create_algorithm
¶
create_algorithm(
**config_overrides: Any,
) -> InferenceAlgorithm
Create a new algorithm instance with merged config.
Source code in src/emic/experiments/registry.py
Result Schema¶
BenchmarkResult
dataclass
¶
BenchmarkResult(
experiment: str,
algorithm: str,
process: str,
n_samples: int,
metric: str,
value: float,
ground_truth: float | None = None,
error: str | None = None,
timestamp: datetime = (lambda: now(UTC))(),
)
A single benchmark measurement.
Represents one algorithm run on one process configuration, measuring one metric. Multiple BenchmarkResults form a complete benchmark run.
Attributes:
| Name | Type | Description |
|---|---|---|
experiment |
str
|
Experiment identifier (e.g., "accuracy", "convergence") |
algorithm |
str
|
Algorithm name (e.g., "cssr", "spectral", "bsi") |
process |
str
|
Process name (e.g., "even_process", "golden_mean") |
n_samples |
int
|
Number of samples used for inference |
metric |
str
|
Metric name (e.g., "cmu", "hmu", "state_count", "duration_s") |
value |
float
|
Measured value |
ground_truth |
float | None
|
Expected value if known, None otherwise |
error |
str | None
|
Exception message if run failed, None otherwise |
timestamp |
datetime
|
When this measurement was recorded |
to_dict
¶
Convert to dictionary for DataFrame construction.
RunMetadata
dataclass
¶
RunMetadata(
timestamp: datetime,
git_commit: str,
git_dirty: bool,
python_version: str,
emic_version: str,
cli_args: list[str],
duration_seconds: float | None = None,
completed: bool = False,
)
Metadata for a complete benchmark run.
Captures environment and configuration for reproducibility.
Attributes:
| Name | Type | Description |
|---|---|---|
timestamp |
datetime
|
When the run started |
git_commit |
str
|
Git commit hash (short form) |
git_dirty |
bool
|
Whether working directory had uncommitted changes |
python_version |
str
|
Python version string |
emic_version |
str
|
emic package version |
cli_args |
list[str]
|
Command-line arguments used |
duration_seconds |
float | None
|
Total run duration |
completed |
bool
|
Whether all experiments finished successfully |
to_dict
¶
Convert to dictionary for YAML serialization.
Source code in src/emic/experiments/schema.py
ResultsWriter
¶
ResultsWriter(
base_dir: str | Path,
shard: tuple[int, int] | None = None,
run_dir: Path | None = None,
)
Write benchmark results to disk.
Creates timestamped directories with Parquet data and YAML metadata. Updates a 'latest' symlink for convenient access.
Example
writer = ResultsWriter(base_dir="experiments/results") writer.add_result(result1) writer.add_result(result2) writer.finalize(metadata)
Creates: experiments/results/2026-01-26T14-32-05/¶
├── metadata.yaml¶
└── results.parquet¶
For sharded runs
writer = ResultsWriter(base_dir="experiments/results", shard=(0, 4))
Creates: results_shard0.parquet instead of results.parquet¶
Initialize writer with output directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_dir
|
str | Path
|
Base directory for results (e.g., "experiments/results") |
required |
shard
|
tuple[int, int] | None
|
Optional (shard_index, total_shards) for sharded output |
None
|
run_dir
|
Path | None
|
Optional explicit run directory (for sharded runs sharing a dir) |
None
|
Source code in src/emic/experiments/schema.py
results_filename
property
¶
Get the results filename, accounting for sharding.
add_result
¶
add_result(result: BenchmarkResult) -> None
add_results
¶
add_results(results: list[BenchmarkResult]) -> None
save_incremental
¶
Save current results incrementally.
Useful for long-running benchmarks to preserve partial results.
Source code in src/emic/experiments/schema.py
finalize
¶
finalize(metadata: RunMetadata) -> Path
Write final results and metadata, update 'latest' symlink.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
RunMetadata
|
Run metadata to save |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the results directory |
Source code in src/emic/experiments/schema.py
read_results
¶
Read benchmark results from Parquet or JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to results.parquet or results.json |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with benchmark results |
Source code in src/emic/experiments/schema.py
read_latest_results
¶
Read results from the 'latest' run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_dir
|
str | Path
|
Base results directory (e.g., "experiments/results") |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with benchmark results |
Source code in src/emic/experiments/schema.py
Functions¶
get_process_registry
¶
get_process_registry() -> ProcessRegistry
Get the default process registry (lazy-initialized).
Source code in src/emic/experiments/registry.py
get_algorithm_registry
¶
get_algorithm_registry() -> AlgorithmRegistry
Get the default algorithm registry (lazy-initialized).