Chapter 7: Six Encodings, One Interface

We built the H₂ Hamiltonian with Jordan–Wigner. But JW has a problem — its Z-chains grow linearly with system size. This chapter asks: can we do better? And if we use a different encoding, do we get the same physics?

In This Chapter


Why Would We Want a Different Encoding?

In Chapter 6, we saw that 4 of the 15 Pauli terms — the exchange terms XXYY, XYYX, YXXY, YYXX — are off-diagonal. These are the terms that generate coherences in the density matrix and produce the correlation energy. They are also the terms that are expensive to simulate on a quantum computer, because each one requires a CNOT staircase proportional to its Pauli weight.

Under Jordan–Wigner, the worst-case Pauli weight of an operator grows linearly with system size: $O(n)$. For H₂ with 4 qubits, the maximum weight is 4 — manageable. But for H₂O with 12 active spin-orbitals (frozen core), it’s 12. For the FeMo-co nitrogen-fixation catalyst with ~100 qubits, it’s ~100. Each off-diagonal term would require ~200 CNOT gates.

This is the motivation for alternative encodings: can we represent the same physics with shorter Pauli strings?

Chapter 5 showed that Bravyi–Kitaev achieves $O(\log_2 n)$ weight using a Fenwick tree, and ternary tree encodings achieve $O(\log_3 n)$. But a natural question arises: if the Pauli strings are different, how do we know we’re still computing the same molecule?

The answer is that all valid encodings preserve the canonical anti-commutation relations (CAR): ${a_p^\dagger, a_q} = \delta_{pq}$. Any encoding that faithfully maps the CAR from fermionic operators to Pauli operators must produce a Hamiltonian with the same spectrum — because the eigenvalues of the Hamiltonian depend only on the algebraic relations among the operators, not on the specific matrix form. Different encodings are related by a unitary change of basis on the qubit Hilbert space, and unitary transformations preserve eigenvalues. This is the mathematical guarantee that the physics is encoding-independent.


Do Different Encodings Give the Same Answer?

Let’s find out empirically. Build the H₂ Hamiltonian with all six encodings and compare:

let h2Factory key = h2Integrals |> Map.tryFind key

for (name, encoder) in encoders do
    let ham = computeHamiltonianWith encoder h2Factory 4u
    let terms = ham.DistributeCoefficient.SummandTerms
    printfn "%-25s  %d terms" name terms.Length
Jordan-Wigner              15 terms
Bravyi-Kitaev              15 terms
Parity                     15 terms
Balanced Binary Tree       15 terms
Balanced Ternary Tree      15 terms
Vlasov Tree                15 terms

Same number of terms. Same identity coefficient ($-1.0704$ Ha in every case). And if we diagonalize each — as we’ll do in Chapter 9 — the eigenvalues agree to machine precision ($< 5 \times 10^{-16}$ Ha).

For H₂ with 4 qubits, the Pauli strings are actually identical across all six encodings. With only 4 qubits, there isn’t enough room for the different encodings to diverge — the Z-chain is at most length 3, and BK’s tree structure on 4 nodes provides at most a marginal improvement.

This is an important lesson: for small molecules, encoding choice barely matters. The differences emerge at scale. But the equivalence — the fact that they all give the same answer — holds at every scale. Let’s understand why.


Why Equivalence Holds: The CAR Theorem

An encoding maps fermionic operators $a_p^\dagger$, $a_p$ to qubit operators (Pauli strings). Different encodings produce different Pauli strings for the same ladder operator. But the second-quantized Hamiltonian

\[\hat{H} = \sum_{pq} h_{pq}\, a_p^\dagger a_q + \frac{1}{2}\sum_{pqrs} \langle pq \mid rs\rangle\, a_p^\dagger a_q^\dagger a_s a_r\]

is defined in terms of the algebra of the operators — their products and commutation relations — not their specific matrix representations.

The key property that any valid encoding must preserve is the canonical anti-commutation relations (CAR):

\[\{a_p^\dagger, a_q\} = \delta_{pq}, \qquad \{a_p^\dagger, a_q^\dagger\} = 0, \qquad \{a_p, a_q\} = 0\]

Here is the theorem:

If an encoding preserves the CAR, then the encoded Hamiltonian has the same eigenvalues as the original fermionic Hamiltonian.

Why? Because the Hamiltonian is a polynomial in the operators $a_p^\dagger$ and $a_p$. If two representations of these operators satisfy the same algebraic relations (the CAR), then any polynomial in them produces the same algebraic object — and therefore the same eigenvalues. This is a standard result in operator algebra: the CAR define the algebra up to unitary equivalence.

In practice, this means the encoding is a unitary change of basis on the qubit Hilbert space. The eigenvalues of a matrix are invariant under unitary conjugation — the spectrum is preserved exactly.

FockMap’s test suite verifies the CAR at the Pauli string level for every encoding: it checks that

\[(\text{encoded } a_i^\dagger)(\text{encoded } a_j) + (\text{encoded } a_j)(\text{encoded } a_i^\dagger) = \delta_{ij} \cdot I\]

for all pairs $(i, j)$, using symbolic Pauli multiplication. If this check passes, the encoding is algebraically valid and the encoded Hamiltonian is guaranteed to preserve the spectrum in principle. In practice, we still compare eigenvalues numerically in Chapter 9 to verify the full pipeline implementation — because algebraic correctness of the encoding does not protect against bugs in integral processing, coefficient assembly, or like-term combination.

This is not just a theoretical nicety. It is what makes encoding choice a free optimization: you can pick the encoding that minimizes circuit cost, knowing — with mathematical certainty, not just empirical evidence — that the physics is preserved.


Six Encodings in Six Lines

With equivalence established, let’s see all six. Chapter 5 introduced three encoding concepts — JW, BK, and ternary trees. FockMap ships six concrete implementations, because the path-based tree framework (which we’ll see shortly) makes it trivial to define new tree shapes:

let encoders = [
    ("Jordan-Wigner",         jordanWignerTerms)
    ("Bravyi-Kitaev",         bravyiKitaevTerms)
    ("Parity",                parityTerms)
    ("Balanced Binary Tree",  balancedBinaryTreeTerms)
    ("Balanced Ternary Tree", ternaryTreeTerms)
    ("Vlasov Tree",           vlasovTreeTerms)
]

All six have the same function signature. All six produce PauliRegisterSequence. Swapping one for another is a one-word change. The first five will be familiar from Chapter 5; the sixth — the Vlasov tree — is a second ternary tree encoding that we’ll introduce in the frameworks section below.


Where the Differences Emerge: Scaling

The differences show up in Pauli weight at larger system sizes:

$n$ JW BK Ternary Tree JW/TT ratio
4 4 3 3 1.3×
8 8 4 4
16 16 5 5 3.2×
32 32 6 5 6.4×

At 32 spin-orbitals, JW’s worst-case operator touches 32 qubits. The ternary tree touches 5. Since each Pauli rotation costs $2(w-1)$ CNOT gates (Chapter 4), this is the difference between 62 CNOTs and 8 CNOTs per term — compounded across every term and every Trotter step.

Recall from Chapter 6 that the off-diagonal terms are the expensive ones. The encoding determines how many qubits those off-diagonal terms touch — and therefore the circuit depth required to create and maintain the coherences that capture the correlation energy.


The Three Frameworks

FockMap implements encodings through two complementary frameworks (plus one special case):

flowchart TD
    subgraph ISF["Index-Set Framework"]
        direction LR
        JW["Jordan-Wigner"] ~~~ BK["Bravyi-Kitaev"] ~~~ PAR["Parity"]
    end

    subgraph PBF["Path-Based Framework"]
        direction LR
        BBT["Balanced Binary"] ~~~ BTT["Balanced Ternary"] ~~~ VLT["Vlasov"] ~~~ CT["Custom"]
    end

    ISF --> PS["PauliRegisterSequence"]
    PBF --> PS
    style PS fill:#d1fae5,stroke:#059669

Index-set framework (MajoranaEncoding.fs): An encoding is defined by three set-valued functions — Update($j$), Parity($j$), and Occupation($j$). JW, BK, and Parity are each defined in 3–5 lines of F#.

Path-based framework (TreeEncoding.fs): An encoding is derived from a labelled rooted tree. Any tree topology works — strictly more general than the index-set framework.

Fenwick-specific (BravyiKitaev.fs): BK uses hand-derived bit-manipulation formulas specific to the Fenwick tree. Faster than generic tree traversal, same result.

Design note: The index-set framework was introduced by Seeley, Richard, and Love (2012) as a unifying abstraction. Our investigation showed that it produces correct encodings only for star-shaped (depth-1) trees — a previously undocumented constraint. The path-based framework (Jiang et al., 2020) removes this restriction. FockMap provides both because the index-set framework is simpler and faster when it applies.

Two ternary trees: why tree shape matters

The path-based framework’s generality has a concrete payoff: FockMap ships two ternary tree encodings that achieve the same optimal $O(\log_3 n)$ worst-case Pauli weight but use different tree shapes — the balanced ternary tree (Jiang et al., 2020) and the Vlasov tree (Vlasov, arXiv:1904.09912). Both preserve the CAR, both produce the same eigenvalues, but they assign qubits to tree nodes differently.

How does one go from a tree shape to a working encoding? That’s the subject of the next chapter, where we’ll build the Vlasov tree step by step — define the tree, plug it into the framework, and verify that it works.


Defining Custom Encodings

FockMap supports two routes for custom encodings beyond the six built-in options:

Index-set route. Define three set-valued functions — Update, Parity, Occupation — and pass them as an EncodingScheme. This works for star-shaped trees (JW, Parity) and is the simplest path.

Tree-based route. Define a tree shape and plug it into the path-based framework. This is strictly more general and is the subject of the next chapter, where we build the Vlasov tree from scratch.

Warning: Not every custom scheme produces a valid encoding. Always verify the CAR before trusting results — Chapter 8 shows how.


A Decision Framework

Situation Recommended encoding Reasoning
Learning / prototyping Jordan–Wigner Simplest to understand and debug
Small system ($n \leq 16$) Jordan–Wigner Weight overhead is manageable
1D chain / local interactions Jordan–Wigner Adjacent-orbital terms have short Z-chains
General-purpose ($n \leq 100$) Bravyi–Kitaev $O(\log_2 n)$ weight, well-studied
Minimum circuit depth Ternary Tree or Vlasov Tree $O(\log_3 n)$ — best known asymptotic scaling
Exploring custom topologies Path-based Arbitrary tree shapes supported
Comparing ternary tree variants Vlasov Tree Same scaling, different qubit assignment
Comparing multiple encodings All six FockMap’s interchangeable interface makes this trivial

Key Takeaways

Common Mistakes

  1. Assuming different Pauli strings mean different physics. They don’t — if the CAR is preserved, eigenvalues are preserved. Different strings, same physics.

  2. Choosing an encoding by term count. All encodings produce the same number of Hamiltonian terms. The difference is Pauli weight per term, which determines CNOT count.

  3. Using a custom encoding without verifying CAR. An encoding that violates anti-commutation will produce plausible but wrong Hamiltonians. Always test.

Exercises

  1. Weight table. Run the scaling comparison for $n = 64$ and $n = 128$. At what point does the ternary tree’s maximum weight exceed 10?

  2. CAR verification. Define a custom encoding and use FockMap’s test infrastructure to check whether it satisfies anti-commutation. What happens if it doesn’t?

  3. Encoding comparison for H₂O. Build the H₂O/STO-3G Hamiltonian (12 active spin-orbitals, frozen core) with all six encodings. Compare maximum and average Pauli weight.

Further Reading


Previous: Chapter 6 — Building the Qubit Hamiltonian

Next: Chapter 8 — Building a Tree Encoding